Double Validation and Rectification Progressive Prompt for Equations in Math Word Problem Based on Large Language Model
Introduction
Mathematics is an indispensable subject in foundational education, serving as both the cornerstone for the development of fundamental disciplines and the basis for artificial intelligence research. Solving mathematical word problems (MWPs) enables researchers to refine algorithms and develop reasoning models that mimic human thought processes. Human problem-solving often involves decomposing complex tasks into sequential steps, an approach mirrored by the chain-of-thought (CoT) method. CoT leverages intuitive reasoning to maintain interpretability while reducing the difficulty of generating direct answers.
Despite the success of large language models (LLMs) using CoT in solving single-unknown MWPs, existing research lacks methodologies tailored for equation-based problems. Equation problems are highly sensitive to reasoning steps—errors in formulating equations can cascade into subsequent mistakes. To address this, we propose the 2ERP (Double Validation and Rectification Progressive Prompt) method, which progressively validates and corrects reasoning paths to output the most probable solution.
Background and Motivation
Challenges in Equation-Based MWPs
Equation problems introduce complexities beyond single-unknown scenarios:
- Step Sensitivity: Errors in equation formulation propagate through subsequent calculations.
- Multiple Unknowns: Increased unknowns expand the solution space exponentially.
- Validation Gaps: Existing methods lack mechanisms to verify both equation correctness and computational accuracy.
Traditional approaches like self-consistency rely on majority voting to select answers but risk perpetuating systematic errors. Other correction methods, such as proximity-based hints (“the answer is near [H]”) or exclusion prompts (“the answer is not [H]”), are inefficient for multi-unknown problems.
The Role of Reflective Reasoning
Human decision-making involves three cognitive layers: autonomous, algorithmic, and reflective minds. Reflective thinking aids in error correction and validation. Current LLM-based CoT methods operate on intuitive reasoning without deliberate validation. The 2ERP method introduces a reflective mechanism through double validation, ensuring that solutions are both logically and numerically sound.
The 2ERP Method
Overview
2ERP is a zero-shot prompting method designed for equation-based MWPs. It integrates:
- Double Validation: Verifies equation correctness and computational accuracy.
- Progressive Rectification: Iteratively narrows the solution space by eliminating incorrect paths.
Key Components
- Initialization
The process begins by inputting the problem into the LLM to generate an initial reasoning path and answer. The method initializes two sets:
• Potential Correct Set (Cp): Stores candidate answers that pass validation.
• Incorrect Set (Cin): Tracks answers that fail validation.
- Validation Module
This module ensures the correctness of both the equation and its solution:
• Equation Verification: The model extracts the equation from the reasoning path and validates its alignment with the problem text.
• Numerical Verification: The answer is substituted back into the equation to confirm computational accuracy.
For example, given the problem:
“The bus initially had 23 children. At the stop, 24 got on while some got off, leaving 8. How many more children got off than got on?”
The model generates equations like:
• Initial children + Children who boarded – Children who alighted = Final count.
• Substituting values: 23 + y – x = 8.
The validation module checks if the derived answer (e.g., x = 15 + y) satisfies the equation.
- Rectification Process
If validation fails, the method employs three rectification strategies:
- Proximity Hint: Suggests answers near validated candidates (e.g., “the answer may be near [A]”).
- Exclusion Hint: Rules out incorrect answers (e.g., “[A] is not the answer”).
- Equation Correction: Reconstructs the equation if it is fundamentally flawed.
Iterative Refinement
The process repeats for a predefined number of iterations (k=6), balancing accuracy and efficiency. Each iteration refines the solution space, converging toward the correct answer.
Experimental Evaluation
Datasets
Experiments were conducted on six datasets spanning Chinese and English MWPs:
- HMWP: Chinese dataset with arithmetic, linear, and nonlinear equations.
- CM17K: Chinese dataset for grades 6–12, covering diverse MWP types.
- Math23K: Large Chinese dataset requiring multi-step reasoning.
- Draw: English dataset with systems of equations.
- GSM8K: High-quality English dataset for single-unknown problems.
- SVAMP: English dataset with irrelevant numerical distractions.
Baselines
2ERP was compared against seven methods:
• Zero-shot: Direct, Zero-Shot-CoT, Plan-and-Solve, PRP-CoT.
• Few-shot: Manual-CoT, Auto-CoT, PHP-CoT.
Results
2ERP achieved state-of-the-art performance:
• Average Accuracy: 66.2% across all datasets.
• Equation Datasets: 6.9% improvement over PRP-CoT on HMWP, CM17K, and Draw.
• Single-Unknown Datasets: 8.6% improvement over few-shot methods on GSM8K and SVAMP.
Key Findings:
- Equation Problems: 2ERP excels in multi-unknown scenarios due to its validation mechanism.
- Language Agnosticism: The method performs consistently across Chinese and English datasets.
- Step Interpretability: Numerical explanations enhance understanding of equation derivations.
Ablation Studies
- Iteration Rounds: Accuracy stabilizes at k=6, with optimal trade-offs between speed and precision.
- Solution Space Reduction: Combining proximity and exclusion hints improves accuracy by 11–12% over initialization-only methods.
- Validation Impact: Double validation boosts accuracy by 2.8–3.1% on equation datasets.
Case Studies
Example 1: Single-Unknown Problem (SVAMP)
Problem:
“23 children were on a bus. At the stop, 24 boarded while some alighted, leaving 8. How many more alighted than boarded?”
Initial Answer:
• Equations: 23 + y – x = 8 → x = 15 + y.
• Answer: 15.
Validation:
• Substituted answer satisfies the equation.
• Numerical explanation confirms logical consistency.
Final Answer: 15.
Example 2: Equation Problem (HMWP)
Problem:
“A city’s tourist count grew by 30% (inbound) and 20% (outbound) this year, totaling 226,000. Last year, inbound tourists exceeded outbound by 20,000. Find last year’s counts.”
Initial Answer:
• Equations: x + y = 226; x – y = 20 → x = 146, y = 80.
Validation Failure:
• Corrected equations: 1.3x + 1.2y = 226; x – y = 20.
• Final answer: x = 100, y = 80.
Discussion
Advantages of 2ERP
- Robustness: Dual validation mitigates cascading errors.
- Generality: Applicable to single and multi-unknown problems.
- Interpretability: Detailed numerical explanations enhance transparency.
Limitations
- Mathematical Prerequisites: Struggles with problems requiring geometric or advanced algebraic knowledge.
- Model Dependency: Performance varies with LLM capabilities (e.g., ChatGLM3-6B outperforms earlier versions).
Future Directions
- Knowledge Injection: Incorporate domain-specific knowledge (e.g., geometry) to improve accuracy.
- Smaller Models: Adapt 2ERP for models with under 1B parameters.
Conclusion
The 2ERP method advances equation-based MWP solving by integrating double validation and progressive rectification. Its reflective reasoning mechanism ensures logical and numerical correctness, achieving significant improvements across diverse datasets. By bridging the gap between intuitive and deliberate reasoning, 2ERP sets a new standard for reliable and interpretable mathematical problem-solving in LLMs.
doi.org/10.19734/j.issn.1001-3695.2024.05.0153
Was this helpful?
0 / 0