Automatic Extraction of Imaging Observation and Assessment Categories from Breast Magnetic Resonance Imaging Reports with Natural Language Processing
Introduction
Breast cancer remains one of the most prevalent and lethal cancers among women worldwide. Early detection and accurate diagnosis are critical for improving patient outcomes. Breast magnetic resonance imaging (MRI) has become an essential tool in the diagnosis and management of breast cancer, particularly for high-risk women. The American College of Radiology (ACR) developed the Breast Imaging Reporting and Data System (BI-RADS) lexicon to standardize the terminology used in breast imaging reports, including MRI. However, most radiology reports are still written in free-text form, which poses challenges for data extraction and analysis. Manual extraction of data from these reports is time-consuming, error-prone, and inefficient, especially in studies with large sample sizes. Natural language processing (NLP) offers a promising solution to automate the extraction of structured data from free-text reports, thereby improving efficiency and accuracy in diagnosis and decision-making.
This study aims to evaluate the performance of an NLP program designed to extract BI-RADS descriptors and final assessment categories from breast MRI reports. The goal is to bridge the gap between unstructured report text and structured data, which is essential for clinical decision support systems and other applications.
Methods
Study Population and Data Collection
The study involved a retrospective analysis of 2330 breast MRI reports from the electronic medical records of Peking University First Hospital, collected between March 23, 2009, and June 1, 2017. The mean age of the patients was 50.9 years, with an age range of 13 to 92 years. Inclusion criteria required that biopsy or postoperative pathological results were available at the time of examination or within a 3-month follow-up period. The reports were divided into two sets: 1635 reports were used for the development of the NLP system, and the remaining 695 reports were used as an independent test set for final evaluation.
Revised BI-RADS MRI Lexicon
The ACR BI-RADS MRI lexicon was revised to align with the writing habits of the department. The revised lexicon includes two major categories of descriptors: overall assessment and lesion assessment. Overall assessment is further divided into fibroglandular tissue and background parenchymal enhancement. Lesion assessment includes anatomic locations, morphology, and enhancement kinetics. The descriptors were organized into a simple ontology structure to facilitate NLP processing.
NLP System Development
The NLP system was developed using an internally developed program (Smartree Clinical Information System, Beijing, China). The system processes breast MRI reports through several steps: section segmentation, sentence segmentation, tokenization, concept matching, and negation detection. The preprocessing steps involved identifying section and sentence boundaries, tokenizing imaging features, correcting spelling mistakes, and expanding abbreviations. The system then matched the input text to BI-RADS terms in the revised lexicon using both exact match and synonym match. Negation detection was performed to identify negated or uncertain concepts, and the results were used to extract imaging observation descriptors, anatomic locations, and BI-RADS assessment categories for each lesion.
Evaluation of the NLP System
The performance of the NLP system was evaluated against a gold standard of manual human review. Two board-certified diagnostic radiologists independently reviewed the 695 test reports, and any discrepancies were resolved by a third reviewer. The NLP system’s recall (sensitivity) and precision (positive predictive value) were calculated for the correct identification of the revised BI-RADS MRI descriptors and BI-RADS categories. The system’s efficiency was also compared to the manual review process in terms of time taken to extract key information.
Results
Manual Extraction Performance
The first reviewer detected 1258 lesions, with a recall of 97.8% and a precision of 98.1%. The second reviewer detected 1260 lesions, with a recall of 97.1% and a precision of 97.7%. The high level of agreement between the reviewers was indicated by a k value of 0.95.
NLP Algorithm Performance
The NLP program detected 1279 lesions, with a recall of 78.5% and a precision of 86.1%. The program’s performance varied across different descriptors, with recall and precision ranging from 70.0% to 99.8% for individual descriptors. The NLP system generated results in less than one second, compared to an average of 3.38 and 3.23 minutes per report for the manual reviewers.
Discussion
The study demonstrates the feasibility of using NLP to extract structured data from free-text breast MRI reports. The NLP program achieved acceptable levels of recall and precision, particularly for descriptors such as background parenchymal enhancement, mass shape, and lymphadenopathy. However, the performance was lower for descriptors like fibroglandular tissue and enhancement kinetic curve, primarily due to non-standard phrases and formatting in the reports.
The NLP system’s efficiency in generating results in less than one second highlights its potential to significantly reduce the time and effort required for data extraction. This efficiency is particularly valuable in large-scale studies and clinical settings where manual review would be impractical.
Limitations and Future Work
The study has several limitations. The NLP system was developed based on the writing habits of a single department, which may limit its generalizability. Future work should involve data from multiple centers to improve the system’s accuracy and robustness. Additionally, the system currently extracts information from all lesions in the reports, rather than focusing on the index lesion, which is most crucial for clinical decision-making. Future versions of the system could be optimized to prioritize the extraction of index lesions and their corresponding imaging features.
Conclusion
The NLP program developed in this study demonstrates high recall and precision for extracting imaging observation descriptors and BI-RADS categories from free-text breast MRI reports. The system’s ability to process reports in less than one second offers a significant advantage over manual review, making it a valuable tool for improving efficiency and accuracy in breast cancer diagnosis and management. By bridging the gap between unstructured report text and structured data, the NLP system has the potential to enhance clinical decision support systems and other applications in radiology.
doi.org/10.1097/CM9.0000000000000301
Was this helpful?
0 / 0