Fine-Grained Spatio-Temporal Multi-Semantic Hypergraph Learning for Next Point-of-Interest Recommendation

Fine-Grained Spatio-Temporal Multi-Semantic Hypergraph Learning for Next Point-of-Interest Recommendation

Introduction

In the era of information technology, personalized services have become an indispensable bridge connecting users with vast amounts of data. As a key technology in intelligent tourism, local lifestyle services, and e-commerce, Point-of-Interest (PoI) recommendation plays a crucial role in discovering potential interests and alleviating information overload. With the rapid development of location-based social networks (LBSNs) such as Foursquare and Facebook Places, users can share their geographical locations by checking in at various PoIs. These PoIs represent specific locations like restaurants, gyms, or shopping centers that may be useful or interesting to users. By analyzing user check-in records, it becomes possible to understand user movement patterns and recommend suitable PoIs.

Traditional PoI recommendation methods focus on exploring users’ long-term preferences, whereas next PoI recommendation aims to uncover sequential dependencies between users and PoIs while incorporating recent spatio-temporal information to provide appropriate recommendations at specific time points. Current next PoI recommendation approaches can be broadly categorized into sequence-based and graph-based methods. Sequence-based methods treat next PoI recommendation as a sequence prediction task, employing sequential modeling techniques to capture transition patterns. These methods have evolved from traditional Markov chains to more advanced recurrent neural networks (RNNs) and self-attention mechanisms. However, they primarily focus on sequential pattern mining and often fail to fully explore higher-order user interactions.

Inspired by graph neural networks (GNNs), graph-based recommendation methods leverage GNNs to capture higher-order collaborative signals and model complex neighborhood relationships, achieving significant performance improvements in next PoI recommendation. Nevertheless, most existing methods only utilize direct positional and temporal relationships between PoIs to learn embeddings, lacking deeper exploration of abstract spatial and temporal features as well as higher-order user interactions. Recent studies have attempted to address this limitation by using hypergraphs to model higher-order relationships among users. However, these approaches still face several challenges:

First, current models primarily rely on sequence and graph embedding learning, failing to effectively address the fine-grained modeling of user interests and preferences. This leads to reduced recommendation accuracy, where recommended PoIs may not align with users’ actual needs and preferences.

Second, existing models often overlook the spatio-temporal correlations between users and PoIs, making it difficult to directly utilize GNNs for modeling. For instance, two users may visit the same PoIs at the same times but in different sequences. If only PoI interactions are considered for information aggregation, their embeddings would be identical. However, the visitation order reflects different lifestyle patterns, suggesting that PoI representations should differ. Additionally, the influence of spatial location cannot be ignored, as users have varying tolerances for distance.

Third, most next PoI recommendation studies capture collaborative signals by randomly sampling one-hop PoI neighbors while neglecting higher-order connectivity. For example, two users may share common PoIs but have different higher-order neighbors connected through these PoIs. Leveraging such higher-order information can effectively alleviate data sparsity and improve model performance, yet current models remain inadequate in this aspect.

To address these challenges simultaneously—fine-grained modeling of user interests, learning user interest features based on spatio-temporal information, and incorporating higher-order interactions—this paper proposes a novel Fine-grained Spatio-Temporal Multi-semantic Hypergraph learning model (FSTMH) for next PoI recommendation.

Methodology

The FSTMH model consists of two main components: a fine-grained embedding module and a multi-level embedding module. The fine-grained embedding module employs geographic graphs and directed hypergraphs to describe geographical distributions and transition relationships between nodes, using contrastive learning to enhance PoI representation quality. The multi-level embedding module inputs multi-layer semantic hypergraphs into a multi-layer hypergraph convolutional network to learn PoI embeddings at different semantic levels. Finally, the model combines the PoI embeddings from both modules to generate the final top-K prediction results.

Fine-Grained Embedding Module

The fine-grained embedding module is designed to deeply explore user information by integrating geographic graphs and directed hypergraphs. This module utilizes different types of graph convolutional networks to learn diverse embeddings, extracting PoI representations that incorporate spatio-temporal interactions.

Geographic Graph Convolutional Network

In next PoI recommendation, geographic graphs play a vital role as they provide PoI location information. Users typically show greater interest in their current or nearby locations, highlighting the importance of geographic graphs in recommendation systems. The geographic relationship between PoIs is represented as a graph where nodes denote PoIs and edges connect PoI pairs within a specific distance threshold. The Haversine formula calculates the geographical distance between PoIs based on their latitude and longitude coordinates.

The geographic graph is then processed by a geographic graph convolutional network (GCN) to learn geographic PoI embeddings. The GCN encoder, implemented using LightGCN, aggregates neighborhood information to generate comprehensive PoI representations that capture spatial proximity patterns.

Directed Hypergraph Convolutional Network

Directed graphs model and analyze user preference patterns by examining user movement trajectories and visitation histories. These graphs reveal user tendencies, frequently visited areas, and potential activity patterns. The directed hypergraph represents PoI transition relationships, where each hyperedge denotes information flow from source nodes to specific target PoIs.

A directed hypergraph convolutional network encodes these complex transition relationships through a two-step aggregation process: node-to-edge propagation aggregates source node embeddings to generate intermediate hyperedge representations, and edge-to-node propagation disseminates these hyperedge embeddings to target nodes. After multiple layers of propagation, the embeddings from each layer are averaged to produce the final directed graph PoI embeddings.

Contrastive Learning

To effectively integrate geographic and directed graph embeddings, contrastive learning enhances PoI representations by comparing different views of the same PoI. The same PoI’s representations from the geographic and directed modules form positive sample pairs, while different PoIs’ embeddings serve as negative samples. The contrastive loss function maximizes the similarity between positive pairs while minimizing that between negative pairs, resulting in richer and more discriminative PoI features.

Fine-Grained Hypergraph Convolutional Network

A collaborative hypergraph describes user-PoI interaction relationships, where each user’s trajectory forms a hyperedge and PoIs are nodes. To incorporate finer-grained information and capture higher-order interactions, the fine-grained hypergraph convolutional network takes the previously learned geographic and directed PoI embeddings as input.

An element-wise multiplication operation fuses these two embeddings, producing a comprehensive message embedding that retains original information while creating new interactive features. This fused embedding is then concatenated with user embeddings and processed through a fine-grained perception network to generate richer representations. By combining user embeddings with PoI embeddings, the model effectively reflects user visitation patterns. Relevant hyperedge information is aggregated to refine node representations, and after multiple propagation layers, the final fine-grained PoI embeddings are obtained by averaging and summing the layer outputs.

Multi-Level Embedding Module

The multi-level embedding module aims to capture rich semantic relationships between users and PoIs, deeply learning potential behavioral patterns across different times and PoIs. A multi-level semantic hypergraph is introduced, where nodes represent users and PoIs, and hyperedges denote multi-dimensional semantic associations. This semantic modeling not only diversifies next PoI recommendations but also considers the variety of user interests and contextual richness.

Similar to the geographic graph convolutional network, the multi-level semantic hypergraph matrix is computed based on set thresholds and multiplied with initialized node embeddings. After multiple layers of learning, the final multi-level PoI embeddings are obtained, enabling the model to capture deeper and more complex user-PoI interaction patterns. This multi-level semantic modeling enhances both the depth of user interest understanding and the diversity of recommendation results.

Prediction and Optimization

For a given user and target PoI, the prediction score is computed using a dot product operation between the fine-grained and multi-level PoI embeddings, followed by a softmax function for normalization. The learning objective is defined as a cross-entropy loss function, where the model predicts the likelihood of a user visiting a PoI.

The final loss function combines the contrastive learning loss, cross-entropy loss, and L2 regularization to prevent overfitting. Weight parameters balance the contributions of different loss components during training.

Experiments

Datasets

To comprehensively evaluate FSTMH’s performance, experiments were conducted on three widely used LBSN datasets: Foursquare-NYC, Foursquare-TKY, and Gowalla. These datasets were chosen to ensure reliable and generalizable results. Foursquare-NYC and Foursquare-TKY contain check-in records from New York and Tokyo, respectively, spanning from April 2012 to February 2023. The Gowalla dataset includes check-ins from February 2009 to October 2010.

To reduce data sparsity and eliminate anomalies, PoIs visited by fewer than five users were removed from the Foursquare datasets, while PoIs with fewer than ten visitors were excluded from Gowalla. The processed datasets were split into training (80%), validation (10%), and test (10%) sets. This rigorous data handling ensures scientific validity and reliability.

Evaluation Metrics

Two widely used metrics—Recall@K (R@K) and Normalized Discounted Cumulative Gain@K (N@K)—were employed to assess model performance. Recall@K measures the proportion of actually visited PoIs in the top-K recommendations relative to all visited PoIs in the test set, reflecting recommendation coverage. NDCG@K evaluates ranking quality by considering the positions of visited PoIs in the recommendation list, accounting for both relevance and order.

Higher values for both metrics indicate better performance. K values of 5 and 10 were selected to examine performance across different recommendation list lengths, providing insights into both precision and breadth of recommendations.

Baseline Models

FSTMH was compared against eight representative next PoI recommendation models spanning RNN-based, self-attention-based, and CNN-based approaches:

  1. LSTM: An RNN variant effective at capturing long-term dependencies in sequential data.
  2. STGN: An LSTM-based model incorporating spatial and temporal information to learn spatio-temporal patterns.
  3. STAN: A self-attention-based model encoding distance and time intervals between check-ins to model spatio-temporal influences.
  4. LightGCN: A simplified GNN model aggregating and propagating information without nonlinear activations or feature transformations.
  5. SGRec: A GNN-based sequential PoI recommendation model enhancing one-hop neighbor collaborative signals via Seq2Graph.
  6. HCCF: A GNN-based self-supervised framework employing hypergraph-enhanced cross-view contrastive learning to capture local and global collaborative relationships.
  7. ASTHL: A GNN-based model disentangling spatio-temporal factors through decoupled central PoI learning and using contrastive learning to improve PoI representations.
  8. MSTHN: A GNN-based model utilizing local and global views for next PoI recommendation.

Results and Analysis

FSTMH demonstrated superior performance across all three datasets, significantly outperforming the eight baseline models. On the NYC dataset, FSTMH achieved improvements ranging from 0.45% to 4.76% over the second-best model. On TKY, improvements ranged from 3.12% to 6.32%, and on Gowalla, from 3.42% to 8.05%. These results highlight two key strengths of FSTMH:

First, its innovative spatio-temporal modeling and contrastive learning techniques enhance PoI representations. Second, the fine-grained hypergraph convolutional network effectively learns higher-order features, mitigating data sparsity. The consistent superiority across geographically and behaviorally diverse datasets underscores FSTMH’s adaptability and generalization capabilities.

Notably, when LightGCN was used alone as a geographic information encoder, its performance was significantly lower than FSTMH’s. This gap arises because LightGCN relies solely on user-PoI interactions, ignoring behavioral sequences and contextual information, making it difficult to capture dynamic interest changes. In contrast, FSTMH integrates temporal, spatial, and other contextual dimensions, enabling more comprehensive and accurate user interest representations.

The importance of spatio-temporal information in next PoI recommendation is evident. Models like ASTHL and MSTHN, which incorporate spatio-temporal factors, consistently outperformed LightGCN. Even SGRec, which only considers temporal aspects, showed improvements of 1.57% to 8.69% on the sparse Gowalla dataset. This further validates the critical role of spatio-temporal information in next PoI recommendation.

Overall, GNN-based models outperformed self-attention-based models, which in turn surpassed RNN-based approaches. This hierarchy reflects the inherent graph-structured nature of user-PoI interactions, where GNNs excel at modeling complex relationships and higher-order information.

Parameter Analysis

Impact of Layer Depth (L)

Experiments varying the number of hypergraph convolutional layers (L) revealed that FSTMH achieves optimal performance with different depths across datasets: 3 layers for NYC, 4 for TKY, and 2 for Gowalla. This demonstrates FSTMH’s adaptability in adjusting its structure based on dataset complexity—deeper networks capture more intricate patterns in denser data, while shallower networks suffice for simpler interactions.

Temperature Parameter (τ)

In the fine-grained embedding module, the temperature parameter τ controls the distinction between samples. Experiments with τ values of 0.1, 0.5, 1, 5, and 10 showed that τ = 0.1 yielded the best performance. This optimal setting effectively highlights relationship features between users and PoIs, enhancing overall model performance.

Ablation Study

Ablation experiments on the NYC and TKY datasets evaluated the contributions of FSTMH’s components:

  1. Removing the fine-grained embedding module (w/o FG): Performance dropped significantly, underscoring the importance of spatio-temporal features in next PoI recommendation.
  2. Removing the multi-level embedding module (w/o ML): Performance declined noticeably, confirming the value of semantic relationship learning.
  3. Removing the geographic graph (w/o GG): Performance degradation was more severe than when removing the directed graph, indicating that spatial factors are more critical than temporal ones in PoI recommendation.
  4. Removing the directed graph (w/o DG): Performance decreased but remained better than complete removal of both components, particularly on the dense TKY dataset, highlighting FSTMH’s strength in handling higher-order information.

These results demonstrate that FSTMH’s components work synergistically to enhance overall performance, with each module contributing uniquely to the model’s success.

Case Study

A case study based on Figure 1 illustrated FSTMH’s advantages in addressing spatio-temporal correlation absence and higher-order connectivity issues.

For spatio-temporal modeling, FSTMH’s geographic and directed hypergraph convolutional networks effectively capture user visitation patterns, distinguishing between different sequences and spatial preferences. For example, it recognizes that two users visiting the same PoIs in different orders reflect distinct lifestyles, adjusting recommendations accordingly—work-related PoIs for one user and leisure-related ones for another.

Regarding higher-order connectivity, the multi-level embedding module processes multi-layer semantic hypergraphs to learn PoI embeddings at various semantic levels. This enables the model to leverage higher-order neighborhood differences among shared PoIs, recommending PoIs that align with each user’s unique interests and patterns.

These examples showcase FSTMH’s strengths in improving recommendation accuracy, diversity, and novelty by comprehensively considering spatio-temporal correlations and higher-order interactions.

Conclusion

In modern social networks, next PoI recommendation systems play a vital role in learning user preferences and preemptively suggesting interesting locations. However, existing methods face challenges such as coarse-grained modeling, difficulty capturing higher-order features, and neglecting spatio-temporal factors.

To address these issues, this paper proposed FSTMH, a fine-grained spatio-temporal multi-semantic hypergraph learning model for next PoI recommendation. FSTMH comprises two core modules: the fine-grained embedding module, which focuses on learning spatio-temporal factors, and the multi-level embedding module, which explores deep semantic information. The fine-grained module innovatively employs contrastive learning to strengthen PoI embeddings from geographic and directed graphs, while the multi-level module learns multi-layer semantic representations.

Extensive experiments on Foursquare-NYC, Foursquare-TKY, and Gowalla datasets demonstrated FSTMH’s superiority over baseline models, with performance improvements of at least 0.45%, 3.12%, and 3.43%, respectively. Ablation studies confirmed the effectiveness of each component.

Despite its advantages, FSTMH has limitations, including high computational demands from multiple graph convolutional networks and insufficient consideration of contextual factors like weather or user mood. Future work will explore integrating temporal knowledge graphs to capture dynamic PoI data changes and incorporating seasonal information for more personalized recommendations.

doi.org/10.19734/j.issn.1001-3695.2024.07.0288

Was this helpful?

0 / 0