After training on a dataset that included tens of thousands of synthetic images, an artificial intelligence (AI) algorithm produced over 90% sensitivity for detecting clinical stroke lesions on MRI, Swiss researchers reported in a study published online on 16 September in Radiology: Artificial Intelligence.
A group led by Dr. Christian Federau of the University of Zurich trained deep-learning models to detect clinical stroke lesions on diffusion-weighted MRI using images that were labeled by an expert neuroradiologist, as well as synthetic images that were produced by extracting stroke lesion features from clinical cases and then combining them with normal exams. In testing, the algorithm trained using the expert labels along with 40,000 synthetic images had greater sensitivity, but less specificity, than all three neuroradiologists in the study.
"The method presented is likely generalizable to other pathologies and could substantially improve machine-learning results in medical imaging applications," the authors wrote.
A challenge in training radiology AI algorithms for specific clinical indications is having access to a sufficiently large and accurately labeled image database to produce high performance. The researchers believed, however, that a database of clinical stroke lesion images could be enhanced with synthetic images of stroke lesions generated by combining the extracted stroke lesion features from clinical cases with normal diffusion-weighted brain volumes.
"We hypothesized that the performance of a network trained on a synthetically enriched dataset would be better than a network trained on clinical data alone," they wrote.
They trained a 3D U-Net convolutional neural network using four different datasets: 375 clinical stroke cases labeled by an neuroradiologist with 10 years of experience, 2,000 synthetic cases produced by a synthetic stroke generator algorithm developed by the researchers, the 375 expert-labeled clinical stroke cases plus 2,000 synthetic cases, and the 375 expert-labeled clinical stroke cases plus 40,000 synthetic cases.
In lesion segmentation performance, the model trained with the clinical stroke cases and 40,000 synthetic cases yielded the highest Dice coefficient -- 0.72. In comparison, two neuroradiologists with two years of experience produced an interreader Dice score of 0.76.
Next, the researchers tested the algorithms' detection performance on an independent test set of 192 cases, including 74 positive and 118 negative exams.
Stroke lesion detection performance on test set | ||||||
Neuroradiologist 1 | Neuroradiologist 2 | Experienced neuroradiologist (reference standard) | Algorithm trained on 375 human-labeled clinical stroke cases | Algorithm trained on clinical stroke cases + 2,000 synthetic cases | Algorithm trained on clinical stroke cases + 40,000 synthetic cases | |
Sensitivity | 78% | 79% | 84% | 85% | 80% | 91% |
Specificity | 92% | 89% | 96% | 48% | 76% | 75% |
The authors found a clear increase in performance when the number of additional synthetic image volumes was increased from 2,000 to 40,000.
"Because the synthetic lesions were produced randomly out of a pool of 2,027 normal volumes and 375 volumes with stroke, this shows that a combinatoric effect took place and that the method presented permits a meaningful extension to the base set on which the training was performed," they wrote. "Further work should investigate whether the diversity of the synthetic stroke lesions could be additionally enhanced using generative methods, such as generative adversarial network or variational auto-encoder, and if the final performance on segmentation and lesion detection could be further increased."