Enhancing Machine Solving of Math Word Problems through Semantic Understanding Augmentation

Enhancing Machine Solving of Math Word Problems through Semantic Understanding Augmentation

The rapid advancement of technology has significantly impacted education, particularly through the integration of artificial intelligence (AI) in learning and teaching processes. Among the various applications of AI, machine solving of math word problems (MWPs) stands out as a fundamental yet challenging task. The ability of machines to automatically solve MWPs not only alleviates the burden on students but also provides valuable insights into problem-solving strategies, thereby fostering deeper learning. This article presents a novel machine-solving method for MWPs, leveraging semantic understanding enhancement to address the limitations of existing approaches.

Introduction

Traditional methods for solving MWPs have evolved from rule-based systems to statistical learning and semantic analysis techniques. Early approaches relied heavily on manual feature extraction and template matching, which were labor-intensive and limited in accuracy. With the advent of deep learning, sequence models like recurrent neural networks (RNNs) and long short-term memory (LSTM) networks improved MWP solving by automating feature learning. However, these models often struggled with complex semantic structures and contextual nuances in problem descriptions.

The emergence of pre-trained language models (PLMs), such as BERT and GPT, revolutionized natural language processing (NLP) tasks, including MWP solving. PLMs excel in capturing contextual relationships and semantic dependencies, making them well-suited for understanding and reasoning through MWPs. Despite their success, standard PLMs often fail to fully grasp the intricate semantic variations and implicit logical relationships in MWPs, leading to incorrect equation predictions and answers.

To overcome these challenges, this work introduces a semantically enhanced pre-trained language model, SeBERT, combined with a novel solving framework, SeBERT-PT, which integrates pooling and tree-based decoding mechanisms. Additionally, a confidence-based judgment mechanism is proposed to filter unreliable predictions, ensuring both accuracy and efficiency in the solving process.

Methodology

1. Semantic-Enhanced Pre-Training with SeBERT

The foundation of the proposed method lies in SeBERT, a BERT-based model augmented with multi-granularity knowledge modeling and continuous semantic integration strategies.

Multi-Granularity Knowledge Modeling
Unlike standard BERT, which primarily masks individual tokens, SeBERT employs a three-stage masking strategy:
• Word-Level Masking: Similar to BERT, random tokens are masked to train the model on contextual coherence.

• Phrase-Level Masking: Phrases, as meaningful conceptual units, are masked to enhance the model’s understanding of multi-word expressions. For Chinese texts, segmentation tools identify phrase boundaries, while English relies on syntactic parsing.

• Entity-Level Masking: Named entities (e.g., “Xiao Ming,” “book”) are masked to capture implicit relationships between variables and their attributes. This stage ensures the model learns to associate entities with their numerical and logical contexts.

Continuous Semantic Integration
SeBERT further refines semantic understanding through three pre-training tasks:
• Word-Related Tasks:

Capitalization Prediction (CP): Identifies capitalized words in English problems, which often carry special significance.

Keyword Prediction (KP): Recognizes high-frequency keywords to improve focus on critical problem elements.

• Structure-Related Tasks:

Sentence Reordering (SR): Shuffles sub-clauses to train the model on logical sentence reconstruction.

Sentence Position Judgment (SPJ): Determines whether sentences are adjacent or unrelated, strengthening contextual awareness.

• Semantic-Related Tasks:

Discourse Relation Prediction (DRP): Analyzes inter-sentence relationships to infer deeper semantic connections.

These strategies collectively enable SeBERT to capture nuanced semantic and structural patterns in MWPs, significantly reducing comprehension errors.

2. The SeBERT-PT Solving Framework

The solving pipeline consists of three core components:

Semantic Encoding with SeBERT
The input problem text is tokenized and converted into embeddings, which SeBERT processes to generate hidden state vectors. These vectors encode bidirectional contextual information, capturing dependencies between words, phrases, and entities.

Pooling Layer for Semantic Aggregation
To consolidate semantic features, an average pooling layer aggregates hidden states into a fixed-length representation. Unlike max pooling, which may discard critical details, average pooling preserves background knowledge and reduces noise, enhancing the model’s ability to retain problem-specific information.

Tree-Based Decoding for Equation Generation
Traditional sequence-to-sequence (seq2seq) models often produce invalid or redundant equations. To address this, SeBERT-PT adopts a seq2tree decoder that constructs expression trees hierarchically:

  1. The decoder predicts nodes (operators or numbers) recursively, starting from the root.
  2. Each node’s probability is computed using a gating mechanism to decide between operators and numerical values.
  3. The tree is traversed in pre-order to generate a valid equation, which is then computed for the final answer.

This structured approach ensures syntactically correct equations and eliminates redundancies (e.g., “x=3+9+2-1” vs. “x=9+2-1+3”).

3. Confidence-Based Judgment Mechanism

Despite rigorous training, neural models may still produce flawed predictions. To mitigate this, a confidence estimation branch is added to the decoder:
• The model outputs a confidence score (0 to 1) for each predicted equation.

• Predictions with scores below a threshold are discarded as untrustworthy, avoiding unnecessary computations.

• The threshold is determined empirically during training by analyzing the distribution of confidence scores for correct vs. incorrect predictions.

This mechanism improves training efficiency by focusing resources on high-confidence solutions while maintaining high accuracy.

Experimental Results

The proposed method was evaluated on Chinese (Math23K, Ape210k) and English (MathQA, MAWPS) datasets, achieving state-of-the-art accuracy:
• Chinese: 85.7% (Math23K), 83.5% (Ape210k).

• English: 77.9% (MathQA), 89.0% (MAWPS).

Key Findings:

  1. Pre-Training Effectiveness: Models pre-trained on domain-specific corpora (e.g., Ape210k for Chinese) outperformed non-pre-trained baselines by ~10%.
  2. Multi-Granularity Masking: Entity-level masking contributed the most to accuracy (+3.4% over phrase-level masking).
  3. Pooling Strategy: Average pooling outperformed max pooling by 1%, as it better retained contextual semantics.
  4. Tree Decoders: Tree-based decoding improved accuracy by 5–8% compared to seq2seq models, validating its superiority in generating valid equations.
  5. Confidence Mechanism: Training time reduced by 35% (from 17.5 to 11.3 hours) without compromising accuracy.

Case Study

Consider the problem:
“Xiao Ming bought 5 apples at ¥2 each and 3 oranges at ¥3 each. How much did he spend in total?”

  1. Semantic Analysis: SeBERT identifies “apples,” “oranges,” and their quantities/prices.
  2. Equation Tree: The decoder generates a tree for “Total = (5 × 2) + (3 × 3).”
  3. Confidence Check: The high confidence score (e.g., 0.95) validates the solution.
  4. Answer: The model outputs “19” as the total cost.

Conclusion

This work advances MWP solving by integrating semantic augmentation, hierarchical pooling, and tree-structured decoding. SeBERT’s multi-granularity masking and continuous semantic tasks address the limitations of conventional PLMs, while the confidence mechanism ensures efficient and reliable solving. Future directions include improving model interpretability and extending the framework to multi-lingual and multi-modal problem sets.

doi.org/10.19734/j.issn.1001-3695.2024.06.0208

Was this helpful?

0 / 0