Legal Text Reading Comprehension Model Based on Multi-Granularity Enhancement and Answer Verification
Introduction
Legal text reading comprehension has emerged as a significant research area in natural language processing (NLP), requiring models to classify answers and extract evidence and answer spans from limited annotated data. However, existing models often suffer from single-granularity encoding and insufficient interaction between questions and legal texts, limiting their performance. To address these challenges, this paper proposes a novel legal text reading comprehension model that leverages multi-granularity encoding, question-evidence attention, and an answer verification mechanism.
Legal documents are characterized by their formal and specialized language, containing terms with subtle semantic differences (e.g., “guarantee/bail” or “proof/evidence”) and complex logical relationships. These features make legal text comprehension particularly challenging. Additionally, legal datasets are often smaller than general-domain datasets due to the difficulty of obtaining expert annotations and the restricted availability of legal documents. Therefore, developing models that maximize the utilization of limited data is crucial for improving performance in legal reading comprehension tasks.
Challenges in Legal Text Reading Comprehension
Limited Data Availability
Legal datasets are typically smaller than general-domain datasets because legal annotations require domain expertise, and many legal documents are not publicly accessible. This scarcity of labeled data makes it difficult to train high-performance models.
Complex Language and Terminology
Legal texts employ formal and precise language, often containing terms that appear similar but carry distinct meanings. For example, “guarantee” and “bail” share overlapping characters but differ significantly in legal contexts. Existing models that rely solely on character-level encoding struggle to distinguish such terms effectively.
Multi-Hop Reasoning Requirements
Legal questions often require multi-step reasoning, where answers must be inferred by combining information from multiple sentences or paragraphs. This necessitates models capable of capturing long-range dependencies and logical relationships within the text.
Need for Explainability
Unlike general-domain question answering, legal applications demand high interpretability. Models must not only provide correct answers but also justify them with supporting evidence, making evidence extraction a critical subtask.
Proposed Model: MGEAV-MRC
To overcome these challenges, the proposed model, MGEAV-MRC (Multi-Granularity Enhanced Answer Verification for Machine Reading Comprehension), introduces three key innovations:
- Multi-Granularity Encoding Module
Existing models primarily use character-level encodings from pre-trained language models like RoBERTa, which fail to capture word-level and sequence-level semantics. The multi-granularity encoding module integrates three levels of representation:
• Character-level encoding: Captures fine-grained semantic details using RoBERTa.
• Word-level encoding: Incorporates word embeddings to better distinguish terms with similar characters but different meanings.
• Sequence-level encoding: Uses the [CLS] token representation to capture broader contextual information.
These representations are combined through summation, allowing the model to leverage diverse linguistic features without excessive computational overhead.
- Question-Evidence Attention Mechanism
Answer classification is often treated as an auxiliary task in existing models, leading to weak interactions between questions and legal texts. The proposed model enhances this interaction by introducing an attention mechanism that leverages evidence extracted from the text. Specifically:
• Evidence weights are computed using a Transformer-based module, which identifies relevant sentences.
• These weights are then used to compute an attention matrix that highlights question-relevant portions of the legal text.
• The enriched representation improves answer classification by focusing on key clues within the document.
This approach strengthens the connection between answer classification and evidence extraction, ensuring that the model’s predictions are grounded in relevant textual evidence.
- Answer Verification Mechanism
Human readers often cross-validate answers by considering both local (evidence-based) and global (document-wide) information. Inspired by this behavior, the model employs an answer verification mechanism:
• Local answer prediction: Generates an answer span based on evidence sentences.
• Global answer prediction: Produces an alternative answer using the entire document context.
• Verification step: Combines these predictions with learned weights to produce a final answer.
Additionally, a distance loss term encourages consistency between local and global predictions, further improving robustness.
Experimental Results
The model was evaluated on multiple datasets, including Chinese legal benchmarks (CAIL2019, CAIL2020, CAIL2020-Enhanced) and the English multi-hop dataset HotpotQA. Key findings include:
Performance on Chinese Legal Datasets
• On CAIL2019, MGEAV-MRC achieved a joint F1 score of 76.48%, outperforming baseline models like FETSF-MRC (75.32%) and Baseline-RoBERTa (71.26%).
• On the more challenging CAIL2020 dataset, the model achieved a joint F1 of 64.16%, surpassing FETSF-MRC (62.63%) and DFGN (46.03%).
• Data augmentation (CAIL2020-Enhanced) further improved performance, with the model reaching 70.82% joint F1, demonstrating its scalability.
Generalization to English Datasets
On HotpotQA, MGEAV-MRC achieved a joint F1 of 69.39%, outperforming specialized models like LOUVRE (67.08%) and IP-LQR (61.10%). This highlights the model’s adaptability to different languages and domains.
Ablation Studies
Removing any of the three key components (multi-granularity encoding, question-evidence attention, or answer verification) led to performance drops:
• Without multi-granularity encoding, joint F1 decreased by 2.45%.
• Removing question-evidence attention reduced performance by 1.09%.
• Disabling answer verification caused a 1.35% decline.
These results confirm the contributions of each module to the model’s overall effectiveness.
Case Study and Interpretability
A case study on a legal document involving invoice fraud demonstrated the model’s ability to:
• Correctly classify the answer type (span-based).
• Identify relevant evidence sentences (e.g., “The defendant purchased fake invoices for 400 yuan”).
• Extract the precise answer (“400 yuan”).
Visualizations of attention weights showed that the model focuses on semantically critical phrases (e.g., “fake invoices”), unlike baselines that fixate on superficial matches.
Conclusion
The MGEAV-MRC model addresses key limitations in legal text reading comprehension by integrating multi-granularity encoding, question-evidence attention, and answer verification. Experimental results demonstrate its superiority over existing methods on both Chinese and English datasets. Future work could explore incorporating syntactic or graph-based structures to further enhance evidence extraction and reasoning capabilities.
doi.org/10.19734/j.issn.1001-3695.2024.09.0314
Was this helpful?
0 / 0