Deep Learning Applied to Two-Dimensional Color Doppler Flow Imaging Ultrasound Images Significantly Improves Diagnostic Performance in the Classification of Breast Masses: A Multicenter Study
Breast cancer remains one of the most prevalent cancers among women globally, emphasizing the critical need for early and accurate diagnostic methods. Ultrasound (US) imaging, particularly two-dimensional (2D) and color Doppler flow imaging (CDFI), has become a cornerstone in breast mass evaluation due to its non-invasive nature and accessibility. However, differentiating between clinically distinct categories of breast masses—inflammatory masses, adenosis, benign tumors, and malignant tumors—remains challenging, even for experienced radiologists. Traditional deep learning approaches for breast mass classification have predominantly focused on binary differentiation (benign vs. malignant), overlooking the clinical necessity of categorizing lesions into subtypes that directly inform treatment strategies. This study addresses this gap by developing a convolutional neural network (CNN) capable of classifying breast masses into four clinically relevant categories using multimodal ultrasound imaging.
Clinical Context and Motivation
In China, breast masses are categorized into four groups based on treatment pathways: inflammatory masses, adenosis, benign tumors, and malignant tumors. This classification is critical because each category necessitates distinct clinical management. For instance, inflammatory masses such as granulomatous mastitis (GM) often mimic malignancy on ultrasound, leading to unnecessary biopsies. Similarly, sclerosing adenosis (SA), a type of adenosis, frequently presents with irregular borders and microcalcifications, resembling malignant tumors. Misdiagnosis of these conditions can result in overtreatment or delayed interventions. Existing Computer-Aided Diagnosis (CAD) systems primarily focus on distinguishing benign from malignant lesions, leaving a diagnostic void for subtypes like adenosis and inflammatory masses. This study’s innovation lies in its multiclass classification framework, which aligns with clinical workflows and enhances decision-making precision.
Study Design and Data Collection
The multicenter retrospective analysis involved 3,623 patients from 13 hospitals across nine Chinese provinces. Data included 15,648 ultrasound images acquired between January 2016 and January 2018. Inclusion criteria required histopathological confirmation via biopsy or surgery, with lesions classified into the four predefined categories. Patients with foreign bodies (e.g., breast implants), HIV coinfection, or poor-quality images (e.g., blurred or artifact-laden) were excluded. The dataset comprised 1,601 benign tumors, 1,179 malignant tumors, 572 inflammatory masses, and 271 adenosis cases. Images were obtained using diverse US systems (GE LOGIQ E9, Siemens, Hitachi, etc.), ensuring heterogeneity in equipment and imaging protocols.
Deep Learning Architecture
The CNN architecture comprised two modules: a detection module for localizing breast masses and a classification module for categorizing lesions.
-
Detection Module:
- Feature Extraction: ResNet-50 generated feature maps from input images, while Feature Pyramid Networks (FPN) captured multi-scale features to accommodate variations in lesion size and imaging equipment.
- Region Proposal: A bounding box regression identified candidate regions, applying non-maximum suppression to eliminate low-confidence proposals. Focal loss addressed class imbalance during training.
-
Classification Module:
- Model Variations: Three configurations were evaluated:
- 2D Model: Solely using grayscale ultrasound images.
- 2D-CDFI Model: Combining 2D and color Doppler images to integrate structural and vascular information.
- 2D-CDFI-PW Model: Incorporating pulsed-wave Doppler (PW) spectral data alongside 2D and CDFI.
- Attention Mechanisms: For the 2D-CDFI-PW model, global pooling and attention mechanisms fused spectral data from PW with 2D and CDFI features.
- Model Variations: Three configurations were evaluated:
Training utilized stochastic gradient descent (SGD) with a learning rate of 0.001, batch size of 64, and data augmentation (rotation ±30°, scaling 0.5–1.5×) to mitigate overfitting. Snapshot Ensembling combined five weak models to enhance robustness.
Key Findings
Performance Across Imaging Modalities
The 2D-CDFI model achieved superior performance compared to 2D and 2D-CDFI-PW models:
- Accuracy: 89.2% (2D-CDFI) vs. 87.9% (2D) and 88.7% (2D-CDFI-PW).
- AUC Values:
- Benign tumors: 0.94 (95% CI: 0.93–0.95).
- Malignant tumors: 0.96 (95% CI: 0.95–0.97).
- Inflammatory masses: 0.80 (95% CI: 0.77–0.83).
- Adenosis: 0.81 (95% CI: 0.78–0.84).
Sensitivity and specificity exceeded 90% for benign and malignant tumors but were lower for inflammatory masses (55% sensitivity) and adenosis (46% sensitivity), reflecting dataset imbalances and subtle imaging features in these categories.
Impact of Lesion Size
The 2D model’s accuracy varied slightly with lesion size:
- ≤1 cm: 81.7%.
- 1–2 cm: 82.3%.
- 2–5 cm: 85.1%.
- >5 cm: 84.6%.
No significant differences were observed between size groups (P > 0.05), demonstrating the model’s robustness across dimensions.
Multicenter Validation and Generalizability
Independent validation using data from the China-Japan Friendship Hospital (CJ) confirmed the model’s adaptability:
- CJ Dataset (219 cases):
- 2D Model: 88.9% accuracy for benign, 90.2% for malignant.
- 2D-CDFI Model: 85.7% accuracy for benign, 90.9% for malignant.
Disparities in performance across hospitals highlighted variability in imaging protocols and lesion prevalence. For example, adenosis cases from Zhengzhou University Hospital showed 17% accuracy due to limited training samples.
Comparison with Radiologists
The CNN outperformed 37 experienced radiologists in a blinded evaluation of 50 test images:
- CNN: 89.2% accuracy, 400 ms processing time (GPU).
- Radiologists: Mean accuracy 30% (range: 10–45%), with an average interpretation time of 314 seconds.
This stark contrast underscores the CNN’s potential to reduce diagnostic delays and improve workflow efficiency.
Technical and Clinical Implications
- Role of CDFI: The integration of color Doppler significantly enhanced classification accuracy by capturing vascular patterns indicative of malignancy (e.g., chaotic intratumoral blood flow). However, PW imaging did not contribute meaningfully, likely due to insufficient training data (only 222 PW images).
- Algorithm Robustness: The model’s consistent performance across equipment brands and lesion sizes supports its applicability in diverse clinical settings, including resource-limited regions.
- Clinical Workflow Integration: Real-time processing (1-second latency on CPU) enables seamless integration into clinical practice, aiding radiologists in prioritizing high-risk cases and reducing unnecessary biopsies.
Limitations and Future Directions
- Data Imbalance: Inflammatory masses and adenosis were underrepresented, affecting model sensitivity. Future studies should prioritize balanced datasets.
- PW Imaging: Larger PW datasets are needed to validate its utility.
- Prospective Validation: While the multicenter design enhances generalizability, prospective trials are necessary to assess real-world performance.
Conclusion
This study demonstrates that deep learning, particularly with 2D-CDFI imaging, achieves high diagnostic accuracy in classifying breast masses into four clinically actionable categories. By surpassing human radiologists in both speed and precision, the proposed CNN model offers a transformative tool for reducing diagnostic errors, optimizing treatment planning, and alleviating the workload of ultrasound practitioners. Future efforts should focus on expanding datasets for underrepresented categories and integrating real-time decision support systems into clinical workflows.
doi.org/10.1097/CM9.0000000000001329
Was this helpful?
0 / 0