An AI system trained in a screening population has lower specificity than radiologists using PI-RADS v2, but considerable care is essential, researchers reported in an article published by European Radiology on 8 December.
AI has applications in screening as a double reader to radiologists, e.g. in determining the prevalence of clinically significant prostate cancer, explained PhD student Fredrik Langkilde and colleagues at the University of Gothenburg.
Depending on the application, AI systems may have to be tweaked to ensure optimal performance to maximize sensitivity in a rule-out system, the authors added. They urge the medical imaging community to carefully validate the performance of AI systems when using them in a screening setting, so they can understand the extent of false positives.
Example case from test set with biparametric MRI data, reference segmentation, and output from AI system. The primary MRI assessment resulted in a PI-RADS 5 lesion in the left dorsal part of the peripheral zone and a PI-RADS 4 lesion in the right dorsal part of the peripheral zone (PSA of 11.6 μg/L). Targeted biopsies from both areas showed ISUP-grade 3. All images are from the same slice location, showing a part of both lesions. A T2-weighted (T2W) image. B Diffusion-weighted image with a b-value of 1,500 s/mm2. (C) Apparent diffusion coefficient map. (D) Reference segmentation with white areas representing tumor. (E) Softmax output from AI system. Inset in the lower left corner shows the detection map (cropped around the prostate) with red voxels representing softmax values of 0.99 and yellow voxels representing softmax values of 0.59. (F) T2W image with reference (D) and detection map (inset in E) overlayed, green represents areas in which both the reference and the AI output are positive (true positive voxels for AI), red only positive in reference (false-negative voxels for AI) and blue only segmented by AI (false-positive voxels for AI). Fredrik Langkilde et al; European Radiology.
The researchers trained and evaluated an AI system based on a deep-learning-based segmentation model (nnU-Net method) was trained and evaluated with MRI data from a prostate cancer screening population. The study population consisted of men retrospectively selected from the Göteborg Prostate Cancer Screening 2 Trial (G2-trial).
The goal of the AI was to detect clinically significant prostate cancer, defined as International Society of Urological Pathology (ISUP) grade 2 or higher. The AI system was compared with the performance of radiologists using PI-RADS v2 evaluation metrics. Histopathology was used as the reference standard in the dataset.
To better verify negative cases, 288 men were subject to systematic biopsies regardless of MRI findings, and all men had at least three years of follow-up, according to the authors.
A total of 1,354 MRI examinations in 1,254 men with a median age of 58 years (range, 50-63 years) were randomly divided into a training set (1,086 exams) and a test set (268 exams). The resulting area under the receiver operating characteristic curve (AUROC) was 0.83 (95% confidence interval, 0.73-0.92) for the AI system; however, this was achieved with significantly lower specificity at matched sensitivity levels compared to radiologists, they wrote.
Three example cases from the test set with a negative reference (clinically significant prostate cancer not present), false-positive outputs by the AI system and true negative primary MRI assessment by radiologists. These three cases were selected to represent typical cases of false-positive AI outputs. For each case, a T2-weighted image (a), a high b-value image (b), and an apparent diffusion coefficient map (c) cropped around the prostate are shown from the same slice centered over the AI system’s output. The AI system’s findings are marked with a red circle surrounding the area instead of the segmentation, to improve visibility of the findings. Case 1. Man with prostate-specific antigen (PSA) of 3.0 ng/mL and a prostate volume of 33 mL resulting in a PSA-density of 0.09 ng/mL2. Systematic biopsies were performed and showed benign results. The AI system marked a lesion in the left apical part of the prostate. In the area, a periprostatic vein with markedly restricted diffusion can be seen. Case 2. Man with PSA of 13.6 ng/mL and a prostate volume of 91 mL resulting in a PSA-density of 0.15 ng/mL2. No biopsies were performed. The AI system marked a lesion in the left transitional zone. In the area, a well-circumscribed lesion in the transitional zone can be seen with markedly restricted diffusion. Case 3. Man with PSA of 2.3 ng/mL and a prostate volume of 30 mL resulting in a PSA-density of 0.08 ng/mL2. No biopsies were performed. The AI system marked a lesion in the middle dorsal part of the prostate. In the area, severe pile-up artifacts on the diffusion-weighted sequence caused by gas in the rectum can be seen.Fredrik Langkilde et al; European Radiology.
Langkilde and colleagues also pointed out that the dataset only includes examinations from a screening population with MRI examinations performed on the same MRI scanner using the same protocol, which, in theory, significantly limits generalization to other populations and scanners. Also, the training and inference were performed using the segmentation method nn-UNet with default settings. “Better performance might still be achieved with customization of the training and inference procedure.”
Some cases lacked histological verification, and a suitable reference standard is crucial when training and evaluating a neural network, they concluded.
To read the full European Radiology article, click here.




















