Question-Oriented Prompt-Tuning for Few-Shot Text Classification

Question-Oriented Prompt-Tuning for Few-Shot Text Classification

Introduction

Text classification is a fundamental task in natural language processing (NLP), with applications ranging from sentiment analysis to topic categorization. Traditional approaches rely on fine-tuning pre-trained language models (PLMs) with task-specific classifiers. However, these methods require substantial labeled data to achieve optimal performance, which is often unavailable in real-world scenarios such as medical diagnostics, legal document analysis, or emerging social media trends. Few-shot learning (FSL) addresses this challenge by enabling models to generalize from a limited number of labeled examples.

Among FSL techniques, prompt-tuning has emerged as a powerful alternative to conventional fine-tuning. Instead of adapting PLMs with additional classification layers, prompt-tuning reformulates the classification task as a cloze-style (fill-in-the-blank) problem, aligning it with the pre-training objective of masked language models (MLMs). This approach leverages the inherent knowledge of PLMs more effectively, particularly in low-resource settings. However, prompt-tuning faces two critical challenges: (1) designing optimal prompt templates and (2) constructing accurate label word mappings (verbalizers). Manual template design is labor-intensive and sensitive to minor phrasing changes, while traditional verbalizers often suffer from limited coverage of label-associated words.

To address these limitations, this paper introduces Question-Oriented Prompt-Tuning (QPT), a novel few-shot text classification method. QPT automates template construction by framing prompts as questions derived from dataset labels, followed by trainable continuous prompts that guide the model’s responses. Additionally, it enhances verbalizers using external knowledge bases, expanding the range of label-related words. Experiments on AG’s News and IMDB datasets demonstrate that QPT outperforms existing baselines, particularly in 5-shot, 10-shot, and 20-shot scenarios.

Background and Related Work

Few-Shot Learning in NLP

Few-shot learning aims to train models with minimal labeled data. Existing FSL methods fall into three categories:

  1. Data Augmentation: Generates synthetic training samples to improve diversity. However, augmented data may not capture novel patterns.
  2. Model Fine-Tuning: Adapts pre-trained models to target tasks but risks overfitting when domain shifts occur.
  3. Metric Learning: Measures similarity between samples using prototypes or memory networks. Performance heavily depends on the chosen distance metrics and data distribution.

While these methods have advanced FSL, they often struggle to generalize from sparse data. PLMs, with their vast pre-trained knowledge, offer a promising alternative.

Prompt-Tuning Paradigm

Prompt-tuning bridges the gap between pre-training and downstream tasks by reformulating classification as an MLM task. For example, classifying a movie review as “positive” or “negative” can be framed as: “This movie was amazing! Does the above text belong to [MASK]?” The model predicts words like “positive” or “negative” for [MASK], which are then mapped to labels.

Early approaches like PET relied on handcrafted templates, which were brittle and required extensive validation. P-tuning replaced discrete prompts with continuous embeddings optimized via gradient descent, but these embeddings lacked interpretability and demanded larger datasets for training. Knowledgeable Prompt-Tuning (KPT) enriched verbalizers using external knowledge (e.g., WordNet) to include synonyms and related terms, mitigating coverage issues.

Despite progress, these methods either depend on manual effort or fail to fully exploit PLMs’ reasoning capabilities. QPT innovates by combining question-based templates, continuous prompt optimization, and knowledge-augmented verbalizers.

Methodology

QPT consists of three stages: (1) template construction, (2) MLM-based prediction, and (3) label mapping.

Question-Oriented Template Construction

Traditional templates are static and sensitive to phrasing. QPT dynamically generates prompts by:

  1. Question Formulation: Converts dataset labels into natural questions. For AG’s News (labels: World, Sports, Business, Sci/Tech), the template becomes: “Does the above text belong to [World/Sports/Business/Sci-Tech]? [MASK].”
  2. Continuous Prompt Optimization: Appends trainable embeddings after the question, initialized via a bidirectional LSTM and MLP. These embeddings adapt during training to refine the model’s response behavior.

This hybrid approach balances interpretability (via questions) and flexibility (via trainable embeddings), reducing reliance on manual design.

Masked Language Model Prediction

The filled template is processed by RoBERTa-large, an MLM variant trained with dynamic masking and larger batches. RoBERTa predicts the [MASK] token by evaluating the likelihood of candidate words (e.g., “science” for Sci/Tech). Unlike fine-tuning, which uses [CLS] tokens for classification, QPT directly leverages the MLM’s pre-trained reasoning ability.

Knowledge-Augmented Verbalizer

Standard verbalizers map single words (e.g., “sports” → Sports), ignoring semantically related terms. QPT expands verbalizers using external knowledge:

  1. Label Word Expansion: Retrieves synonyms and hypernyms from lexical databases (e.g., “sports” → {“athletics”, “competition”, “soccer”}).
  2. Noise Reduction: • Relevance Filtering: Removes low-relevance words using TF-IDF-inspired scoring.

• Learnable Weighting: Assigns trainable weights to label words, downweighting noisy terms during training.

For IMDB sentiment analysis, the “Positive” verbalizer includes words like “acclaimed” and “accomplishment,” while “Negative” covers “adverse” and “apathy.” This ensures robust coverage without manual curation.

Experiments

Datasets and Baselines

QPT was evaluated on: • AG’s News: 4-class news topic classification.

• IMDB: Binary sentiment analysis (positive/negative).

Comparisons included:

  1. Fine-tuning: RoBERTa with a linear classifier.
  2. PET: Manual templates + handcrafted verbalizers.
  3. P-tuning: Continuous prompts without questions.
  4. KPT: Knowledge-enhanced verbalizers with manual templates.

Results

QPT achieved consistent improvements across few-shot settings: • AG’s News: +0.81% (5-shot), +0.56% (10-shot), +0.51% (20-shot) accuracy over KPT.

• IMDB: +1.36% (5-shot), +1.23% (10-shot), +1.17% (20-shot) accuracy.

Notably, QPT’s gains were most pronounced in the 5-shot scenario, highlighting its effectiveness in extreme low-resource conditions. Fine-tuning performed poorly due to overfitting, while PET and P-tuning were outperformed by QPT’s question-driven prompts and knowledge-augmented verbalizers.

Ablation Study

Removing key components degraded performance: • No Questions: Accuracy dropped by 0.81% (AG’s News) and 1.01% (IMDB).

• No Continuous Prompts: Larger declines of 3.2% and 1.1%, underscoring their role in optimizing responses.

• No Knowledgeable Verbalizer: Led to inferior coverage and noise sensitivity.

Applications and Future Directions

QPT is applicable to any few-shot text classification task, such as: • Medical Diagnostics: Classifying rare diseases with minimal case reports.

• Emerging Trends: Detecting new topics in social media with sparse labels.

Future work could extend QPT to:

  1. Text Generation: Adapting question-based prompts for summarization or dialogue.
  2. Multimodal Tasks: Integrating visual or auditory cues for cross-modal classification.

Conclusion

QPT advances few-shot text classification by automating prompt design and enhancing verbalizers. Its question-oriented templates and knowledge-augmented mappings outperform existing methods while reducing manual effort. By better harnessing PLMs’ capabilities, QPT offers a scalable solution for low-resource NLP applications.

For further details, refer to the original paper: https://doi.org/10.19734/j.issn.1001-3695.2024.07.0259

Was this helpful?

0 / 0