Online Continual Learning Method Strengthening Decision Boundaries and Self-Supervision
Introduction
Online continual learning (CL) in the context of class-incremental learning (CIL) aims to develop deep learning models capable of accumulating knowledge from new classes while retaining information learned from previous classes. A critical challenge in this setting is catastrophic forgetting (CF), where the model’s performance on previously learned classes deteriorates as it adapts to new data. This issue arises because historical knowledge from old data tends to be overwritten by new information.
Replay-based methods have demonstrated strong performance in mitigating catastrophic forgetting by storing subsets of old-class data for rehearsal during training. However, many existing models in this category tend to learn object-agnostic solutions—features that are not strongly tied to specific classes—which are difficult to generalize and prone to forgetting. Therefore, learning representative features that best capture class characteristics is crucial for addressing catastrophic forgetting.
This paper introduces an online continual learning method that strengthens decision boundaries and incorporates self-supervision. The approach enhances classification performance by reinforcing decision boundaries between new and old classes while leveraging self-supervised learning to extract more discriminative and generalizable features. Experiments on standard datasets (CIFAR-10 and CIFAR-100) demonstrate that the proposed method achieves superior accuracy and lower forgetting rates compared to existing approaches.
Background and Related Work
Catastrophic Forgetting in Continual Learning
Catastrophic forgetting occurs when a model trained on sequential tasks loses performance on earlier tasks as it learns new ones. This phenomenon is particularly problematic in online CL, where data arrives in a single-pass stream, and the model must adapt incrementally without revisiting past data. Traditional deep learning models, when trained on new tasks, tend to overwrite previously learned weights, leading to a rapid decline in performance on old tasks.
Replay-Based Methods
Replay-based approaches address catastrophic forgetting by storing a subset of past data in a memory buffer and replaying it alongside new data during training. Methods such as Experience Replay (ER) and Maximal Interfered Retrieval (MIR) select samples that are most susceptible to interference, improving retention of old knowledge. Adversarial Shapley Value for Efficient Rehearsal (ASER) employs a Shapley value-based buffer management strategy to optimize sample selection. Supervised Contrastive Replay (SCR) uses contrastive learning to enhance feature discrimination, while Online Continuous Learning through Mutual Information Maximization (OCM) leverages mutual information to prevent forgetting.
Despite their effectiveness, many replay-based methods still struggle with learning class-specific features, often favoring simpler, less generalizable solutions.
Knowledge Distillation in Continual Learning
Knowledge distillation has been widely adopted to mitigate catastrophic forgetting by transferring knowledge from a previous model to the current one. Incremental Classifier and Representation Learning (iCaRL) uses distillation to preserve old knowledge while learning new classes. Contrastive Continual Learning (Co²L) employs self-distillation to retain features, and Prototype Augmentation and Self-Supervision (PASS) maintains old-class decision boundaries by preserving prototypes. However, these methods can be limited when the initial model performs poorly, as the distilled knowledge may not be sufficiently informative.
Prototype-Based Learning
Prototypes—representative feature vectors for each class—have been used to reduce forgetting. iCaRL and SCR employ prototypes for classification, while PASS retains old prototypes to preserve learned knowledge. However, computing prototypes for all samples is computationally expensive. Some approaches, such as Continual Prototype Evolution (CoPE), use momentum-based updates for prototypes, while others estimate class prototypes incrementally. The proposed method introduces an online prototype framework that dynamically updates prototypes using only current batch data, reducing computational overhead.
Contrastive Learning
Contrastive learning aims to learn an embedding space where similar samples are clustered together, and dissimilar samples are separated. In self-supervised settings, similarity is determined through data augmentation, while supervised contrastive learning uses class labels. Instance-level contrastive learning pulls augmented views of the same sample closer while pushing apart different samples. However, this approach can struggle with class imbalance. Proxy-based contrastive learning addresses this by introducing proxy samples to balance training. The proposed method combines both approaches to enhance feature learning.
Proposed Method
Problem Formulation
In online class-incremental learning, the model receives a stream of data batches, each containing samples from new classes. The goal is to classify both new and old classes without task identifiers during inference. The model is trained on batches consisting of new data and samples retrieved from a memory buffer storing past data.
Framework Overview
The proposed method consists of two main components:
- Strengthening Decision Boundaries (SDB): Dynamically adjusts the influence of new and old data during training to enhance class separation.
- Self-Supervised Contrastive Learning (SSCL): Combines instance-level and proxy-based contrastive learning to improve feature representation.
The overall framework processes incoming data and replay samples, updates prototypes, and optimizes model parameters using a combined loss function.
Strengthening Decision Boundaries
To address data imbalance between new and old classes, the method introduces a loss function that balances their contributions:
• The first term ensures that the model maintains decision boundaries for old classes.
• The second term enforces separation between new and old classes.
By dynamically adjusting the weights of these terms, the model prevents excessive bias toward new classes while preserving old knowledge.
Self-Supervised Contrastive Learning
The method integrates two contrastive learning strategies:
- Instance-Level Contrastive Learning: Encourages the model to pull together augmented views of the same sample while pushing apart different samples.
- Proxy-Based Contrastive Learning: Introduces proxy samples to mitigate class imbalance and improve generalization.
The combined loss function enhances feature discrimination and robustness.
Online Prototype Learning
The online prototype framework updates prototypes using only current batch data, reducing computational cost. Prototypes are computed by averaging feature representations of samples belonging to each class. The contrastive loss between prototypes and their augmented views further refines feature learning.
Training and Optimization
The total loss function combines:
• Decision boundary strengthening loss
• Self-supervised contrastive loss
• Online prototype learning loss
The model is optimized using gradient descent, with memory buffer updates following a uniform sampling strategy.
Experiments and Results
Datasets and Evaluation Metrics
Experiments were conducted on CIFAR-10, CIFAR-100, and TinyImageNet datasets. Performance was measured using:
• Average Accuracy: Mean classification accuracy across all tasks.
• Average Forgetting: Degree of performance drop on previous tasks after learning new ones.
Implementation Details
The model uses ResNet-18 as the backbone, with a projection layer for contrastive learning. Training employs the Adam optimizer with a learning rate of 5e-4 and weight decay of 1e-4. Batch size and replay batch size are set to 10 and 64, respectively.
Comparative Analysis
The proposed method outperforms existing approaches across different memory buffer sizes:
• On CIFAR-10 (M=100), it achieves 60.8% accuracy with 15.5% forgetting.
• On CIFAR-100 (M=500), it reaches 25.9% accuracy with 13.7% forgetting.
Notably, the method shows significant improvements when memory is limited, demonstrating its efficiency in resource-constrained settings.
Ablation Studies
Ablation experiments confirm the contributions of each component:
• Decision Boundary Strengthening: Improves accuracy by enforcing class separation.
• Self-Supervised Contrastive Learning: Enhances feature discrimination.
The combined approach yields the best results, validating the synergy between the two strategies.
Discussion
The proposed method effectively mitigates catastrophic forgetting by reinforcing decision boundaries and leveraging self-supervised learning. Key advantages include:
• Robust Feature Learning: The fusion of contrastive learning strategies improves feature generalization.
• Dynamic Class Separation: Adjusting the influence of new and old data prevents bias and enhances stability.
• Computational Efficiency: Online prototype updates reduce memory and computational costs.
However, challenges remain in handling extreme class imbalance, where gradient conflicts between old and new classes can still occur. Future work may explore adaptive gradient balancing to further enhance performance.
Conclusion
This paper presents an online continual learning method that strengthens decision boundaries and incorporates self-supervised contrastive learning. By dynamically balancing new and old class contributions and refining feature representations, the method achieves superior accuracy and reduced forgetting compared to existing approaches. Experimental results on benchmark datasets validate its effectiveness, particularly in memory-constrained scenarios. Future research will focus on optimizing gradient-based learning to address class imbalance more effectively.
doi.org/10.19734/j.issn.1001-3695.2024.06.0191
Was this helpful?
0 / 0