Clinically Applicable Gleason Grading System for Prostate Cancer Based on Deep Learning

Clinically Applicable Gleason Grading System for Prostate Cancer Based on Deep Learning

Prostate cancer is one of the most common malignant tumors of the male genital system, with approximately 1.1 million new cases reported globally in 2012. Accurate diagnosis of prostate cancer is critical for successful treatment, particularly when the disease is still confined to the prostate gland. The Gleason grading (GD) system, first established by Donald Gleason between 1966 and 1974, remains one of the most powerful predictors of oncological outcomes for men with prostate cancer. The Gleason pattern ranges from 1 to 5, with higher scores indicating poorer differentiation, worse prognosis, and a higher likelihood of metastasis. The total Gleason score (GS) is calculated by combining the dominant and non-dominant Gleason patterns.

Despite its clinical significance, the Gleason grading system has limitations. Differences in interpretation among pathologists and the subjective assessment of the proportion of each grade in the specimen can lead to poor repeatability of diagnosis and even misdiagnosis, particularly for small lesions. To address these challenges, we proposed a deep learning-based Gleason grading system to assist in the histopathological diagnosis of prostate cancer. This system aims to improve objectivity, accuracy, and efficiency in prostate cancer diagnosis.

Methodology

Data Collection and Preparation

The study utilized prostate biopsy slides collected from the China-Japan Friendship Hospital. A total of 123 hematoxylin-eosin (HE)-stained slides were used for model training, and 10 slides were used for validation. Additionally, 137 HE-stained slides were collected for model testing. All slides underwent rigorous quality control to ensure the tissue was complete, flat, and free from knife marks, cracks, or bubbles. Corresponding immunohistochemistry (IHC) slides, including p63, 34bE12, and p504S, were used to assist in the labeling process.

The slides were digitized using a KF-PRO-005 scanner at 400x magnification. The tissue area was divided into 320×320-pixel patches with a 200x field-of-view (0.5 mm/pixel). A total of 152,139 training patches were obtained, including Gleason patterns 3 (25,316 patches), 4 (31,176 patches), and 5 (25,344 patches), as well as high-grade prostate intraepithelial neoplasia (HPIN) (3,252 patches), inflammation (2,744 patches), and normal tissue (64,307 patches).

Labeling Process

Two licensed pathologists with 11 and 30 years of experience in prostate pathological diagnosis, respectively, reviewed all whole-slide images (WSIs) using an in-house labeling system. The labels included Gleason patterns 3–5, HPIN, inflammation, and normal tissue. The slides were first assigned to the first pathologist and then reviewed by the senior pathologist. During the labeling process, the pathologists used the corresponding IHC slides as references to ensure accuracy.

Model Training

The deep learning model used in this study was based on the DeepLab v3 image segmentation model with ResNet-50 as the backbone. The model parameters were initialized using a pre-trained gastric cancer detection model and fine-tuned using the prostate training data through transfer learning. Model training was performed using TensorFlow on 8 NVIDIA GTX1080Ti GPUs. The optimizer used was ADAM, with a learning rate of 0.0001, a batch size of 256, and 28,000 training iterations. Histopathological-oriented data augmentation techniques were applied to enhance the model’s robustness.

The slide-level prediction was defined as the average of the top 100 probabilities of the pixel-level predictions. The model was evaluated in a binary classification manner, where “malignant” was defined as Gleason patterns 3–5 and “benign” as HPIN, inflammation, and normal tissue.

Model Performance

The deep learning model achieved a sensitivity of 100.00%, a specificity of 87.04%, and an accuracy of 94.89% in distinguishing malignant from benign tissue. The model’s predictions were consistent with the senior pathologist’s diagnosis in 100 out of 137 cases. In 22 cases, the model’s predictions were very close to the senior pathologist’s diagnosis, with a difference of only one score.

The model demonstrated superior performance in several cases, particularly in identifying small foci of cancer and local Gleason pattern 4 lesions within a Gleason pattern 3 background. Additionally, the model correctly predicted 20 samples with a GS ≥ 8, while the attending pathologist correctly predicted only 13. The model also outperformed the attending pathologist in detecting HPIN, with a sensitivity of 100.00% compared to the attending pathologist’s sensitivity of 87.04%.

Validation and Testing

The model was further validated using historical prostate samples collected from May 2013 to July 2015 at the China-Japan Friendship Hospital. The model achieved a sensitivity of 100.0% and a specificity of 91.4% for malignant tumor detection. Additionally, 166 slides from the Chinese PLA General Hospital were used for testing, where the model achieved a sensitivity of 97.0% and a specificity of 77.4%.

Limitations and Future Work

Despite its promising performance, the model has some limitations. There were instances of false-positive cases and inaccurate Gleason grading. To address these issues, more training samples are required to optimize the model and improve its specificity continuously. Additionally, the model’s performance in detecting rare or complex cases needs further investigation.

Conclusion

The deep learning-based Gleason grading system proposed in this study offers a clinically applicable tool for prostate cancer diagnosis. The system can intuitively identify lesions and provide objective Gleason scores, saving significant time for pathologists. It demonstrated high accuracy and consistency in distinguishing malignant from benign tissue and outperformed human pathologists in several cases. However, ongoing optimization and validation are necessary to address the model’s limitations and ensure its widespread clinical applicability.

doi.org/10.1097/CM9.0000000000001220

Was this helpful?

0 / 0