Hierarchical Reinforcement Learning for Knowledge Graph Reasoning with Bi-LSTM and Multi-Head Attention

Hierarchical Reinforcement Learning for Knowledge Graph Reasoning with Bi-LSTM and Multi-Head Attention

Knowledge reasoning plays a crucial role in knowledge graph completion, addressing the incompleteness of knowledge graphs (KGs) that affects downstream tasks such as intelligent question answering, knowledge prediction, and recommendation systems. Traditional knowledge reasoning methods face challenges such as poor interpretability, inability to leverage hidden semantic information, and sparse reward signals in reinforcement learning (RL)-based approaches. To overcome these limitations, this paper introduces a novel hierarchical reinforcement learning method that integrates Bidirectional Long Short-Term Memory (Bi-LSTM) and multi-head attention mechanisms, named HRL-BM.

Introduction

Knowledge graphs represent structured knowledge as triples (subject, relation, object), but they often suffer from incompleteness. Reasoning methods aim to infer missing relationships or entities by analyzing existing knowledge. Existing approaches can be broadly categorized into embedding-based, path-based, and reinforcement learning-based methods.

Embedding-based methods, such as TransE, TransH, and ComplEx, map entities and relations into continuous vector spaces and compute similarity scores for reasoning. While effective, these methods lack interpretability and struggle with multi-hop reasoning. Path-based methods, such as the Path-Ranking Algorithm (PRA), explore relational paths between entities but suffer from scalability issues in large KGs. Reinforcement learning-based methods, such as DeepPath and MINERVA, model reasoning as a Markov Decision Process (MDP), enabling interpretable multi-hop reasoning. However, these methods still face challenges in long-path reasoning and sparse rewards.

To address these issues, HRL-BM introduces a hierarchical reinforcement learning framework that decomposes reasoning into high-level cluster reasoning and low-level entity reasoning. The model leverages Bi-LSTM and multi-head attention to capture long-term dependencies and hidden semantic relationships. Additionally, a mutual reward mechanism enhances training efficiency by providing dense feedback signals.

Methodology

Hierarchical Reinforcement Learning Framework

HRL-BM divides the reasoning process into two levels:

  1. High-Level Agent (Hight Agent): Operates at the cluster level, where the KG is partitioned using spectral clustering. The agent navigates between clusters to identify the target cluster containing the answer entity.
  2. Low-Level Agent (Low Agent): Operates within the selected cluster, reasoning over entities and relations to find the target entity.

This hierarchical decomposition reduces the action space, improving reasoning efficiency. The two agents share state information, allowing coordinated reasoning.

State and Action Spaces

The state spaces for both agents are designed to capture relevant contextual information:
• Hight Agent State: Comprises the current cluster and the source cluster.

• Low Agent State: Includes the current entity, source entity, and query relation.

The action spaces consist of possible transitions:
• Hight Agent Actions: Neighboring clusters or a stop action to synchronize reasoning.

• Low Agent Actions: Outgoing relations and connected entities.

Reward Mechanism

To address sparse rewards, HRL-BM introduces three reward components:

  1. Path Efficiency Reward: Encourages shorter reasoning paths.
  2. Path Diversity Reward: Promotes exploration of diverse paths to avoid local optima.
  3. Mutual Reward: Ensures coordination between the two agents by evaluating action consistency.

The final reward combines these components, providing dense feedback for training.

Bi-LSTM and Multi-Head Attention Fusion

The model processes historical reasoning paths using a Bi-LSTM to capture sequential dependencies. The Bi-LSTM generates hidden state representations that encode past actions and states. These representations are then processed by a multi-head attention mechanism, which dynamically assigns weights to different features, enhancing the model’s ability to focus on relevant semantic relationships.

• Bi-LSTM: Processes sequences bidirectionally, capturing both past and future context.

• Multi-Head Attention: Computes attention scores across multiple subspaces, improving feature extraction.

The fused representations are used to update the policy networks, guiding the agents’ action selection.

Policy Networks

The policy networks for both agents predict the next action based on the current state and historical information:
• High-Level Policy Network: Selects the next cluster using cluster embeddings and attention-weighted history.

• Low-Level Policy Network: Chooses the next entity-relation pair using entity and relation embeddings.

The networks employ ReLU activation and softmax normalization to compute action probabilities.

Experiments

Datasets and Evaluation Metrics

Experiments were conducted on three standard KG datasets:

  1. FB15K-237: A subset of FreeBase with complex relational patterns.
  2. WN18RR: A WordNet subset with semantic relationships.
  3. NELL-995: A dataset derived from the Never-Ending Language Learning system.

Performance was evaluated using:
• Hits@k: Measures the probability of the correct entity appearing in the top-k predictions.

• Mean Reciprocal Rank (MRR): Evaluates the average rank of correct predictions.

Results

HRL-BM outperformed baseline methods across all datasets:
• FB15K-237: Achieved a Hits@1 of 0.642 and MRR of 0.705, surpassing CURL by 9.5%.

• WN18RR: Demonstrated strong performance in semantic reasoning.

• NELL-995: Showed robustness in short-path reasoning despite the dataset’s sparsity.

The hierarchical approach and attention mechanism significantly improved reasoning accuracy, particularly in long-path scenarios.

Ablation Studies

Ablation experiments confirmed the contributions of key components:

  1. Removing Bi-LSTM and Multi-Head Attention: Performance dropped significantly, highlighting their role in capturing semantic dependencies.
  2. Removing Mutual Rewards: Led to lower convergence rates, emphasizing the importance of dense feedback.

Case Study

A reasoning example from NELL-995 illustrated the model’s ability to correct errors by backtracking via inverse relations, demonstrating robust path reasoning.

Parameter Analysis

• Path Length: Optimal performance was observed at a path length of 3, balancing information depth and sparsity.

• Attention Heads: Eight attention heads provided the best feature extraction capability.

Conclusion

HRL-BM advances knowledge graph reasoning by integrating hierarchical reinforcement learning with Bi-LSTM and multi-head attention. The method effectively addresses challenges in interpretability, hidden semantic extraction, and sparse rewards. Experimental results demonstrate superior performance in both long-path and short-path reasoning tasks.

Future work may explore integrating large language models (LLMs) to enhance semantic understanding and multi-task learning capabilities.

doi.org/10.19734/j.issn.1001-3695.2024.06.0197

Was this helpful?

0 / 0