Semi-Supervised 3D Object Detection Based on Mining Valuable Potential Samples
Introduction
Three-dimensional object detection is a critical technology in autonomous driving, enabling vehicles to perceive and understand their surroundings. While fully supervised 3D object detection methods have achieved significant progress, they rely heavily on large-scale labeled datasets, which are expensive and time-consuming to obtain. To address this limitation, semi-supervised learning has emerged as a promising approach, leveraging a small amount of labeled data alongside a large pool of unlabeled data to train robust models.
Traditional semi-supervised 3D object detection methods often employ fixed thresholds to filter pseudo-labels generated by a teacher model. However, these fixed thresholds are inflexible, leading to either the exclusion of potentially valuable pseudo-labels or the inclusion of low-quality ones. This paper introduces a novel semi-supervised 3D object detection framework that dynamically generates adaptive thresholds, improves pseudo-label quality through joint confidence filtering, and enhances model performance by leveraging dense and soft pseudo-labels for underrepresented categories.
Challenges in Semi-Supervised 3D Object Detection
Current semi-supervised 3D object detection methods face several key challenges:
- Inflexible Threshold Selection – Fixed thresholds for pseudo-label filtering often fail to account for varying detection performance across different object categories. High thresholds discard useful pseudo-labels, while low thresholds introduce noise into training.
- Underutilization of Low-Confidence Samples – Many methods discard low-confidence predictions, ignoring potentially valuable samples that could improve model generalization.
- Class Imbalance – Categories with fewer samples, such as pedestrians and cyclists, often suffer from poor detection performance due to insufficient labeled data.
To overcome these challenges, this paper proposes three key innovations: adaptive threshold generation, joint confidence filtering, and dense pseudo-label generation with soft pseudo-labels.
Methodology
Adaptive Threshold Generation
Instead of using a fixed threshold for pseudo-label filtering, this work introduces an adaptive threshold generation method based on score clustering. The approach involves:
- Score Clustering – The teacher model predicts bounding boxes on labeled data, and the confidence scores (objectness, classification, and IoU) are clustered using K-means++.
- Dynamic Threshold Assignment – The centroids of these clusters serve as category-specific thresholds, allowing different object classes to have tailored filtering criteria.
- Iterative Refinement – As the model improves during training, the thresholds are updated to reflect the evolving detection performance.
This adaptive approach ensures that pseudo-labels are filtered more accurately, retaining high-quality predictions while discarding unreliable ones.
Joint Confidence Filtering
Pseudo-labels consist of both class and bounding box information, yet traditional methods often rely on a single confidence score (e.g., classification or IoU) for filtering. This can lead to misaligned predictions where high classification confidence does not guarantee accurate localization.
To address this, the proposed method employs a joint confidence filtering strategy, combining:
• Objectness Confidence – Measures whether a detection contains a valid object.
• Classification Confidence – Indicates the certainty of the predicted class.
• IoU Confidence – Reflects the accuracy of the predicted bounding box.
Only predictions where the product of these three confidence scores exceeds the adaptive threshold are retained as pseudo-labels. This multi-faceted filtering improves pseudo-label quality and reduces false positives.
Dense and Soft Pseudo-Labels
For underrepresented categories (e.g., pedestrians and cyclists), the scarcity of labeled data makes learning robust features difficult. To mitigate this, the proposed method introduces two strategies:
- Dense Pseudo-Labels – Instead of applying non-maximum suppression (NMS) aggressively, the method retains multiple high-quality pseudo-labels for rare classes, increasing their representation in training.
- Soft Pseudo-Labels – Low-confidence predictions that do not meet the joint confidence threshold are not entirely discarded. Instead, those with scores above a secondary threshold (e.g., 0.4) are retained as soft pseudo-labels, providing additional supervision without introducing excessive noise.
These strategies ensure that the model learns from a broader range of samples, improving detection performance for underrepresented classes.
Experimental Results
Dataset and Evaluation
The proposed method is evaluated on the KITTI dataset, a standard benchmark for 3D object detection in autonomous driving. The experiments use only 1% and 2% labeled data, with the rest treated as unlabeled. Performance is measured using mean Average Precision (mAP) at 40 recall positions, with IoU thresholds of 0.7 for cars and 0.5 for pedestrians and cyclists.
Performance Comparison
- Baseline Comparison – Compared to PV-RCNN (a fully supervised method), the proposed approach improves detection performance by 6.5% for cars, 9% for pedestrians, and 25% for cyclists with only 1% labeled data.
- Semi-Supervised Methods – The method outperforms 3DIoUMatch by 4%, 6%, and 17% for cars, pedestrians, and cyclists, respectively. It also surpasses DetMatch in car and cyclist detection but lags slightly in pedestrian detection due to DetMatch’s use of additional 2D image information.
- Fully Supervised Comparison – With only 6% labeled data, the proposed method achieves performance comparable to fully supervised methods like SECOND and PointRCNN, demonstrating its efficiency in leveraging unlabeled data.
Ablation Studies
- Adaptive Thresholding – Experiments confirm that dynamic thresholds outperform fixed thresholds, particularly for rare classes where fixed thresholds either discard too many useful pseudo-labels or retain too many noisy ones.
- Soft Pseudo-Labels – Retaining low-confidence predictions (with scores above 0.4) improves model performance, especially for underrepresented categories. Setting the secondary threshold too low (e.g., 0.3) introduces noise, while setting it too high (e.g., 0.5) reduces the benefit.
- Dense Pseudo-Labels – Retaining multiple high-quality pseudo-labels for rare classes enhances detection accuracy, as seen in the improved pedestrian and cyclist detection results.
Qualitative Analysis
Visual comparisons show that the proposed method detects more objects with fewer false positives compared to PV-RCNN and 3DIoUMatch. For example, in scenarios with distant or partially occluded vehicles, the method successfully identifies objects that other approaches miss.
Conclusion
This paper presents an advanced semi-supervised 3D object detection framework that addresses key limitations in existing methods. By introducing adaptive threshold generation, joint confidence filtering, and dense/soft pseudo-labels, the approach significantly improves detection performance, particularly for underrepresented classes.
The experimental results on the KITTI dataset demonstrate the method’s effectiveness, achieving substantial improvements over both semi-supervised and fully supervised baselines. Future work could explore integrating multi-modal data (e.g., LiDAR and cameras) to further enhance detection robustness.
doi.org/10.19734/j.issn.1001-3695.2024.04.0169
Was this helpful?
0 / 0