An artificial intelligence (AI) algorithm can label more than 100,000 brain MRI exams in less than 30 minutes, facilitating the creation of large datasets needed for training deep-learning models, according to research published online recently in European Radiology.
Researchers led by Dr. David Wood and senior author Dr. Thomas Booth of King's College London (KCL) in the U.K., have trained and tested a deep-learning algorithm to derive labels for brain MRI exams at scale from their associated radiology reports.
"By overcoming this bottleneck, we have massively facilitated future deep-learning image recognition tasks, and this will almost certainly accelerate the arrival into the clinic of automated brain MRI readers," Booth said in a statement. "The potential for patient benefit through, ultimately, timely diagnosis, is enormous."
The development of high-performing deep-learning models for image recognition tasks in radiology requires assembling of large-labeled datasets, a challenging and tedious task that has been a barrier to algorithm development and adoption. The research team sought to address this problem by developing a deep learning-based report classifier that can perform automated labeling from these neuroradiology studies based on their reports.
Making use of natural language processing technology, the normal/abnormal report classifier was trained using 3,000 brain MRI exams that had been labeled by a team of neuroradiologists. The researchers then evaluated the classifier's performance on test sets not used during training.
In comparison with the reference-standard report labels, the neuroradiology report classifier yielded an area under the curve (AUC) of 0.991 for 600 reports at King's College Hospital NHS Foundation Trust. It also produced an AUC of 0.990 for 500 reports obtained from Guy's and St. Thomas' NHS Foundation Trust, demonstrating strong generalizability. What's more, the classifier achieved an AUC of 0.973 when tested against reference-standard image labels in 250 studies from KCL.
The model also provided an AUC of more than 0.95 for all seven specialized categories of abnormalities, with a slight (AUC of > 0.02) drop in performance for three categories: atrophy, encephalomalacia, and vascular, according to the researchers. This lower performance was attributed to discrepancies in the original reports.
%{[ data-embed-type="image" data-embed-id="6555227764e6b8cb92446a5d" data-embed-element="span" data-embed-alt="Study protocol flowchart indicates report and image inclusion numbers at each stage of the study pathway. All 126,556 eligible MRI scans and corresponding neuroradiology reports were included in this study (60,123 unique patients, 32,747 women, mean age 48 years ± 18 years standard deviation). 5,000 reports were randomly selected for labeling by a team of six expert neuroradiologists for model training and evaluation, and each was assigned reference standard labels derived on the basis of manual inspection of the reports. Of these exams, a subset of 950 was randomly selected and the images were manually interrogated by a team of neuroradiologists to derive reference standard image labels for additional model evaluation. The model was trained on the reference standard report labels and validated using these labels as well as reference standard image labels. Once trained, the model assigned labels to the remaining 121,556 reports and corresponding images. Figure courtesy of Dr. Thomas Booth." data-embed-src="https://img.auntminnieeurope.com/files/base/smg/all/image/2021/08/ame.2021_08_04_15_32_6332_2021_08_03_image-labeling_image.png?auto=format%2Ccompress&fit=max&w=1280&q=70" data-embed-caption="Study protocol flowchart indicates report and image inclusion numbers at each stage of the<br>study pathway. All 126,556 eligible MRI scans and corresponding neuroradiology reports were included in this study (60,123 unique patients, 32,747 women, mean age 48 years ± 18 years standard deviation). 5,000 reports were randomly selected for labeling by a team of six expert neuroradiologists for model training and evaluation, and each was assigned reference standard labels derived on the basis of manual inspection of the reports. Of these exams, a subset of 950 was randomly selected and the images were manually interrogated by a team of neuroradiologists to derive reference standard image labels for additional model evaluation. The model was trained on the reference standard report labels and validated using these labels as well as reference standard image labels. Once trained, the model assigned labels to the remaining 121,556 reports and corresponding images. Figure courtesy of Dr. Thomas Booth." data-embed-width="1120" data-embed-height="721" ]}%In the next phase of their research, the group is working on the challenge of performing deep-learning image recognition tasks that also have multiple technical challenges. Once this is achieved, the authors said they will then ensure that the developed models can still perform accurately across different hospitals using different scanners.
"Obtaining clean data from multiple hospitals across the UK is an important step to overcome the next challenges," Booth said. "We are running an [National Institute for Health Research] portfolio adopted study across the UK to prospectively collect brain MRI data for this purpose."