A Comprehensive Overview of Cross-Social Network User Matching Using Spatial-Temporal Transformer-Encoder

A Comprehensive Overview of Cross-Social Network User Matching Using Spatial-Temporal Transformer-Encoder

Introduction

The rapid development of the internet has led to the widespread use of social networks. Due to the functional differences among various platforms, users often register accounts on multiple social networks to fulfill diverse needs. This results in the necessity of cross-social network user matching, which plays a crucial role in personalized recommendations, targeted advertising, and privacy protection. Among various types of user data, spatial-temporal check-in data is particularly valuable due to its uniqueness and difficulty in forgery, making it highly reliable for user identification.

Existing methods for cross-social network user matching often process spatial and temporal information separately, leading to the loss of intrinsic coupling relationships between these dimensions. This limitation results in difficulties in feature extraction and reduced matching accuracy. To address these challenges, this paper proposes a novel method called User Matching Method for Cross Social Networks based on Spatial-Temporal Transformer-encoder (UMMSTT), which effectively integrates spatial and temporal information to improve matching performance.

Background and Related Work

Previous research on cross-social network user matching has explored various approaches, including trajectory-based methods, graph-based techniques, and deep learning models. Some studies focus on extracting user location features using embedding techniques, while others leverage temporal patterns or social network structures. However, most existing methods suffer from one or more of the following limitations:

  1. Independent Processing of Spatial and Temporal Data – Many approaches treat spatial and temporal features separately, ignoring their interdependencies.
  2. Feature Extraction Challenges – Some models struggle to capture high-dimensional patterns in sparse or noisy check-in data.
  3. Scalability Issues – Graph-based methods often face computational inefficiencies when applied to large-scale datasets.
  4. Data Sparsity – Certain techniques perform poorly when user check-in data is limited.

To overcome these issues, UMMSTT introduces an optimized Transformer-encoder architecture combined with convolutional neural networks (CNNs) to enhance feature extraction and matching accuracy.

Methodology

  1. Data Preprocessing and Grid Mapping

The proposed method begins by converting raw check-in data into structured sequences through grid mapping. This step simplifies the data while preserving essential spatial-temporal relationships. Two types of grid mapping techniques are employed:

• Independent Spatial-Temporal Grid Mapping – Separately processes spatial and temporal data into 2D grids.

• Joint Spatial-Temporal Grid Mapping – Integrates spatial and temporal dimensions into a unified 3D grid, capturing their coupling effects.

The grid mapping process involves discretizing check-in coordinates and timestamps into sub-grid indices, which are then used to construct user check-in sequences. These sequences serve as input for subsequent feature extraction.

  1. Sequence Embedding

Discrete check-in sequences are transformed into continuous high-dimensional vectors using an embedding layer. This step ensures that the model can effectively process sequential patterns in the data. The embedding layer maps each check-in point to a dense vector representation, facilitating further feature extraction.

  1. Optimized Transformer-Encoder for Feature Extraction

The core innovation of UMMSTT lies in its optimized Transformer-encoder architecture, which enhances feature extraction through the following components:

Multi-Head Self-Attention Mechanism
• Captures long-range dependencies in check-in sequences.

• Computes attention weights to identify significant spatial-temporal patterns.

• Processes input embeddings in parallel to improve efficiency.

Convolutional Neural Network (CNN) Integration
• Enhances local feature extraction by applying convolutional filters.

• Introduces noise for data augmentation, improving model robustness.

• Optimizes weight transformation and feature fusion through a secondary CNN layer.

Residual Connections and Normalization
• Stabilizes training by preventing gradient vanishing.

• Ensures smooth feature propagation across layers.

This hybrid architecture reduces computational complexity while maintaining high accuracy, making it suitable for large-scale datasets.

  1. User Matching Classifier

The extracted high-dimensional features are flattened into a 1D vector and fed into a feedforward neural network (FFN) for classification. The FFN learns the relationship between feature representations and user identity matching, outputting a similarity score. A threshold-based decision mechanism determines whether two accounts belong to the same user.

Experimental Evaluation

Datasets and Evaluation Metrics

Experiments were conducted on two real-world social network datasets: Brightkite and Gowalla, which contain extensive check-in records with timestamps and geographic coordinates. The datasets were split into training (80%) and testing (20%) sets, with balanced positive and negative samples.

Performance was evaluated using standard metrics:
• Accuracy (Acc) – Measures overall prediction correctness.

• Precision (Pre) – Indicates the proportion of true matches among predicted positives.

• Recall (Rec) – Measures the model’s ability to identify all true matches.

• F1-Score (F1) – Harmonic mean of precision and recall.

Results and Analysis

UMMSTT demonstrated significant improvements over existing methods:

  1. Superior Matching Accuracy
    • Achieved 99.31% accuracy on Brightkite and 99.38% on Gowalla.

    • Outperformed baseline models by 0.40–10.53 percentage points in accuracy.

  2. Enhanced Feature Extraction
    • The joint spatial-temporal grid mapping improved feature representation compared to independent processing.

    • The optimized Transformer-encoder reduced computational overhead while maintaining high performance.

  3. Robustness to Data Sparsity
    • The model maintained high accuracy even with limited check-in data, thanks to data augmentation techniques.

Ablation Studies

Ablation experiments confirmed the contributions of key components:
• Joint Spatial-Temporal Grid Mapping – Improved accuracy by 0.30–0.37 percentage points over independent mapping.

• CNN Integration – Enhanced feature fusion and reduced training time.

• Multi-Head Attention – Effectively captured sequential dependencies.

Practical Applications

The proposed method has several real-world applications, including:

  1. Personalized Recommendations – Linking user accounts across platforms enables better content and location-based suggestions.
  2. Fraud Detection – Identifying duplicate or fake accounts by analyzing check-in patterns.
  3. Social Network Integration – Facilitating seamless user experiences across multiple platforms.

A case study on a travel recommendation system demonstrated UMMSTT’s effectiveness, achieving a 98.5% success rate in matching user accounts across networks.

Conclusion

This paper introduced UMMSTT, a novel method for cross-social network user matching that leverages spatial-temporal Transformer-encoders and CNNs. By integrating spatial and temporal features through joint grid mapping and optimized attention mechanisms, the model achieved state-of-the-art performance on real-world datasets. Future work may explore additional data modalities (e.g., social connections, text content) to further enhance matching accuracy.

doi.org/10.19734/j.issn.1001-3695.2024.05.0146

Was this helpful?

0 / 0