Hierarchical Semantic Parsing Approach for Multi-Hop Question Answering on Knowledge Graphs

Hierarchical Semantic Parsing Approach for Multi-Hop Question Answering on Knowledge Graphs

Introduction

Knowledge graphs (KGs) have become fundamental tools for organizing structured semantic information, storing entities and their relationships in a machine-readable format. They are widely applied in natural language processing tasks such as question answering, dialogue systems, and recommendation systems. Among these applications, knowledge graph question answering (KGQA) stands out as a critical area, leveraging structured semantic information to interpret user queries and generate accurate responses. However, while KGQA systems excel at handling simple questions requiring single-hop reasoning, they often struggle with complex multi-hop questions that involve multiple entities and layered relationships.

Multi-hop questions demand reasoning across several interconnected facts in the knowledge graph. For example, answering “When did the movies acted by Faizon Love release?” requires identifying the actor, retrieving the movies he starred in, and then finding their release years. Traditional semantic parsing methods, which convert natural language questions into structured queries, perform well on single-hop questions but falter when faced with multi-step reasoning. The primary challenges include accurately parsing deep semantic structures, mapping relationships to KG elements, and constructing coherent reasoning paths.

Recent advancements in large language models (LLMs) offer promising solutions to these challenges. LLMs, with their extensive pre-training and strong generalization capabilities, enhance semantic parsing and logical reasoning. However, existing approaches like KG-GPT and StructGPT still face limitations in deep semantic parsing and interpretability. To address these gaps, this paper introduces HL-GPT (Hierarchical Parsing and Logical Reasoning GPT), a novel framework designed to improve multi-hop KGQA through hierarchical semantic parsing and structured reasoning path construction.

Challenges in Multi-Hop Question Answering

Multi-hop KGQA presents several key challenges that hinder performance:

  1. Deep Semantic Parsing – Complex questions often contain nested relationships that require layered interpretation. Traditional methods struggle to decompose these relationships systematically, leading to incomplete or inaccurate parsing.
  2. Relationship Mapping – Accurately aligning parsed relationships with KG elements is non-trivial, especially when multiple candidate relations exist.
  3. Reasoning Path Construction – Building a coherent reasoning path from the question entity to the answer entity demands precise intermediate step retrieval and logical consistency.
  4. Interpretability – Many existing models generate answers without transparent reasoning traces, making it difficult to validate correctness or debug errors.

HL-GPT addresses these challenges by leveraging LLMs for hierarchical parsing, fine-tuned embedding models for relationship mapping, and structured reasoning paths for interpretable answers.

The HL-GPT Framework

The HL-GPT framework consists of three core stages: hierarchical semantic parsing, graph retrieval, and logical reasoning.

  1. Hierarchical Semantic Parsing

This stage decomposes the question into manageable layers, extracting entities and relationships progressively.

First-Level Parsing focuses on identifying explicit entities and their immediate relationships. For the question “When did the movies acted by Faizon Love release?”, the model extracts the entity “Faizon Love” and the direct relationship “acted by.” This step is implemented using LLM-based prompt engineering, where the model is guided to isolate key components.

Second-Level Parsing delves deeper into implicit relationships. The question is decomposed into sub-queries: “When did the movies release?” and “acted by Faizon Love.” The LLM then extracts the deeper relationship “release year,” enabling multi-hop reasoning. This layered approach reduces ambiguity and enhances parsing accuracy.

  1. Graph Retrieval

After parsing, the extracted logical forms (e.g., (“Faizon Love”, “acted by”, “release year”)) are mapped to the KG. A pre-trained embedding model fine-tuned on domain-specific relationship pairs computes cosine similarity between parsed relations and KG relations. For instance, “acted by” is matched with “starred_actors,” and “release year” is aligned with the corresponding KG property.

The mapped relationships are then converted into structured queries (e.g., Cypher for graph databases) to retrieve relevant entities. In the example, the system retrieves movies linked to “Faizon Love” and their respective release years.

  1. Logical Reasoning

The final stage integrates retrieved facts with the original question to generate an interpretable answer. The KG triples are converted into natural language descriptions, providing context for the LLM. For the sample question, the model outputs:

“The movie ‘Who’s Your Caddy?’ starring Faizon Love was released in 2007, and the movie ‘Couples Retreat’ was released in 2009.”

This response not only answers the query but also traces the reasoning path, improving transparency.

Experimental Evaluation

HL-GPT was evaluated on four datasets: MetaQA (movie domain), COKG-DATA (COVID-19 domain), AeroQA (aviation safety), and NLPCC-MH (open-domain Chinese QA).

Results on MetaQA

HL-GPT achieved near-perfect accuracy on single-hop questions (99.94%) and maintained high performance on multi-hop tasks (98.33% for 2-hop, 97.75% for 3-hop). Compared to supervised baselines like KV-Mem and EmbedKGQA, HL-GPT showed significant improvements, particularly in complex reasoning. For example, it outperformed KV-Mem by 48.85 points on 3-hop questions.

Results on Domain-Specific Datasets

• COKG-DATA: HL-GPT achieved 99.99% (1-hop), 99.53% (2-hop), and 99.08% (3-hop) accuracy, surpassing COKG-QA by up to 6.63 points.

• AeroQA: The framework scored 96.05% (1-hop) and 75.85% (2-hop), outperforming KITLM by 32.33 points on 2-hop questions.

• NLPCC-MH: HL-GPT attained 75.91% (2-hop) and 69.56% (3-hop), exceeding DPQA by 9.8 points.

Ablation Studies

  1. Framework Impact: Without HL-GPT’s hierarchical parsing, ChatGPT’s accuracy dropped by 59.05 points on 3-hop tasks.
  2. Sample Efficiency: Performance plateaued with 12 training samples, indicating optimal balance between data efficiency and generalization.

Case Study

Traditional models often provide fragmented answers. For example:
• Question: “The movies starred by Craig Stevens were written by who?”

• Baseline Output: “Jonas Ward, Martin Berkeley, William Alland.”

• HL-GPT Output: “The movie ‘Buchanan Rides Alone’ was written by Jonas Ward, and ‘The Deadly Mantis’ was written by Martin Berkeley and William Alland.”

HL-GPT’s responses are detailed and traceable, enhancing usability.

Conclusion

HL-GPT advances multi-hop KGQA through hierarchical semantic parsing and structured reasoning. By leveraging LLMs for deep parsing and fine-grained relationship mapping, the framework achieves state-of-the-art accuracy while improving interpretability. Future work may explore dynamic sample selection and enhanced embedding techniques to further reduce reliance on training data.

doi.org/10.19734/j.issn.1001-3695.2024.07.0262

Was this helpful?

0 / 0