Development and Validation of Radiomics Model Built by Incorporating Machine Learning for Identifying Liver Fibrosis and Early-Stage Cirrhosis
Liver fibrosis (LF) and early-stage cirrhosis (ESC) are critical conditions in hepatology, as they represent stages of liver disease that can potentially be reversed if identified and treated early. Advanced cirrhosis, on the other hand, is often irreversible and associated with poor clinical outcomes. The gold standard for diagnosing LF and ESC has traditionally been liver biopsy, an invasive procedure with inherent risks and limitations. However, recent advancements in quantitative imaging techniques, particularly radiomics, have opened the door to non-invasive diagnostic methods. This study focuses on the development and validation of a radiomics model based on diffusion-weighted imaging (DWI) to accurately identify LF and ESC, offering a promising alternative to biopsy.
Introduction
Hepatic cirrhosis is a major public health concern worldwide, particularly in China, where it contributes significantly to morbidity and mortality. Cirrhosis often results from the progression of LF, which can be classified into stages F1 to F3 according to the METAVIR scoring system. ESC, classified as F4, represents the early stage of cirrhosis, where liver function is still preserved. However, as cirrhosis progresses, liver failure becomes inevitable. Early identification of LF and ESC is crucial, as timely intervention can halt or even reverse disease progression. While liver biopsy has been the gold standard for diagnosis, its invasive nature, sampling variability, and low patient tolerance have driven the search for non-invasive alternatives.
Medical imaging, particularly magnetic resonance imaging (MRI), has emerged as a powerful tool for non-invasive diagnosis. Among MRI techniques, diffusion-weighted imaging (DWI) has shown promise in evaluating liver diseases, including LF and cirrhosis. Radiomics, a quantitative imaging analysis method, extracts a large number of features from medical images to develop biomarkers for diagnosis and prognosis. This study aims to leverage radiomics and machine learning to develop a robust model for identifying LF and ESC based on DWI data.
Methods
Study Design and Ethical Approval
This retrospective study was conducted at Shandong Cancer Hospital and Institute, with ethical approval obtained from the Institutional Review Board. The study included 369 cases: 108 patients with LF, 116 patients with ESC, and 145 patients with healthy livers. All patients underwent DWI scans, and the inclusion criteria ensured that only cases with normal liver morphology, pathologically confirmed LF or ESC, and clear disease stages were included. Cases with abnormal liver morphology, significant ascites, or insufficient image quality were excluded.
Image Acquisition and Preprocessing
All MRI scans were performed using a Philips 3.0 Tesla scanner with an eight-channel abdominal phased array coil. DWI images were acquired with b-values of 0, 400, and 800 s/mm², selected based on previous research indicating that these values provide non-redundant information. High-resolution T1-weighted images were also obtained to assist in the delineation of volumes of interest (VOIs).
Volume of Interest (VOI) Definition
Two experienced radiologists manually delineated VOIs in the liver parenchyma using MIM maestro software. The VOIs were defined in three liver segments (II/III, V/VI, and VII) on b = 0 s/mm² DWI images and then mapped to the corresponding regions in other b-value images. This process ensured consistency and reproducibility in feature extraction.
Radiomics Feature Extraction
A total of 93 radiomics features were extracted from each VOI using the SlicerRadomics extension in 3D Slicer software. These features were categorized into six groups: first-order intensity histogram-based features, gray-level co-occurrence matrix (GLCM)-based features, gray-level run-length matrix (GLRLM)-based features, gray-level size-zone matrix (GLSZM)-based features, neighboring gray-tone difference matrix (NGTDM)-based features, and gray-level dependence matrix (GLDM)-based features. To enhance reproducibility, the intensity values within the VOIs were discretized into 100 fixed bins before feature calculation.
Model Construction and Feature Selection
Two modeling strategies were employed in this study. Plan 1 involved a two-step process: Model 1 was designed to differentiate between healthy and abnormal livers, while Model 2 aimed to classify LF and ESC within the abnormal liver group. Plan 2 consisted of two parallel models: Model 1 differentiated healthy livers from LF, and Model 2 differentiated healthy livers from ESC.
Feature selection was performed in two steps. First, univariate analysis was used to identify features with significant predictive power (P < 0.1). Second, the RELIEFF algorithm was applied to select the top three features from each category, resulting in 18 features for model training. A support vector machine (SVM) with a radial basis function kernel was used to construct the classification models. To avoid overfitting, 1000 iterations of ten-fold cross-validation were performed during the training process.
Performance Evaluation
The performance of the models was evaluated using the area under the receiver operating characteristic curve (AUC) and accuracy. The training and validation cohorts were randomly divided, and the sample sizes for each model are detailed in the study.
Results
Univariate Analysis and Feature Selection
Univariate analysis identified 75 and 63 features with predictive power for Plan 1 Models 1 and 2, respectively. For Plan 2, 62 and 59 features were identified for Models 1 and 2. The RELIEFF algorithm further refined these features, selecting the top three from each category for model training.
Model Performance
In Plan 1, Model 1 achieved an AUC of 0.973 (95% CI: 0.946–1.000) and an accuracy of 91.5% in the training cohort, and an AUC of 0.948 (95% CI: 0.903–0.993) and an accuracy of 89.1% in the validation cohort. Model 2 achieved an AUC of 0.944 (95% CI: 0.905–0.983) and an accuracy of 88.9% in the training cohort, and an AUC of 0.968 (95% CI: 0.940–0.996) and an accuracy of 92.6% in the validation cohort.
In Plan 2, Model 1 achieved an AUC of 0.882 (95% CI: 0.845–0.919) and an accuracy of 82.5% in the training cohort, and an AUC of 0.857 (95% CI: 0.808–0.906) and an accuracy of 82.1% in the validation cohort. Model 2 achieved an AUC of 0.843 (95% CI: 0.793–0.899) and an accuracy of 74.3% in the training cohort, and an AUC of 0.863 (95% CI: 0.804–0.922) and an accuracy of 79.3% in the validation cohort.
Optimal Plan for Identifying LF and ESC
Plan 1 was determined to be the optimal strategy for identifying LF and ESC, as it demonstrated robust performance in the validation cohort. Plan 2, on the other hand, showed limitations due to false predictions when ESC cases were input into Model 1 and LF cases into Model 2. The AUC for false predictions in Plan 2 was 0.774 (95% CI: 0.720–0.828) for LF and 0.698 (95% CI: 0.635–0.761) for ESC, indicating that Plan 2 was less suitable for clinical use.
Discussion
The results of this study demonstrate the potential of radiomics analysis of DWI images for the non-invasive identification of LF and ESC. The proposed model, particularly in Plan 1, achieved high AUC values and accuracy, indicating its reliability and robustness. This approach offers a significant improvement over traditional biopsy, providing a non-invasive, repeatable, and potentially more accurate diagnostic tool.
Previous studies have explored the use of MRI-based texture analysis for staging LF, but few have focused on DWI radiomics. The performance of the DWI-based radiomics model in this study surpasses that of previous models based on gadoxetic acid-enhanced MRI, which reported AUCs around 0.910 and accuracies of 82.1%. The use of multiple b-values in DWI images, combined with advanced feature selection and machine learning techniques, likely contributed to the superior performance of the proposed model.
Limitations and Future Directions
Despite its promising results, this study has some limitations. The retrospective nature of the study and the unavailability of ADC maps during DWI acquisition may have impacted the results. Future research should focus on prospective studies with larger patient cohorts and the inclusion of ADC maps to further validate the model. Additionally, incorporating clinical features, such as hepatic biological parameters, could enhance the model’s diagnostic accuracy.
Conclusion
This study successfully developed and validated a radiomics model based on DWI images for the non-invasive identification of LF and ESC. The model demonstrated high accuracy and robustness, offering a promising alternative to liver biopsy. As radiomics continues to evolve, its integration with advanced imaging techniques and machine learning holds great potential for improving the diagnosis and management of liver diseases.
doi.org/10.1097/CM9.0000000000001113
Was this helpful?
0 / 0