Artificial intelligence (AI) shows promise for detecting breast cancers on screening mammography when compared with double reading in a dataset of nearly 123,000 women, according to research published on 29 March in Radiology.
A team led by Dr. Solveig Hofvind, PhD, from the Section for Breast Cancer Screening, Cancer Registry of Norway found that the proportion of screen-detected cancers not selected by AI at evaluated thresholds was less than 20%, with several of these possibly being detected at an early stage in the next screening round.
"This is the largest study on real screening data so far, and it is difficult to compare with other studies as they are usually smaller and have used an enhanced data set," Hofvind told AuntMinnieEurope.com.
Double reading is standard practice in most European breast screening programs. However, informed reviews have classified about 25% of screen-detected and interval cancers as missed and that 20% of screen-detected cancers were recommended for recall by one of two radiologists in independent double reading.
Previous research suggests AI can detect cancers in mammographic examinations. But studies have analyzed mostly small population sizes with enriched data sets, leaving evidence gaps.
"Retrospective studies on clinical data sets using consecutive examinations provide an opportunity to independently validate AI systems before evaluation in prospective studies," the researchers wrote.
Hofvind et al wanted to compare the performance of a commercially available AI system (Transpara version 1.7.0, ScreenPoint Medical) with consensus, independent double reading in a population-based screening program. They also looked at the histopathologic characteristics of tumors with different AI scores ranging from 1 to 10.
A score of 1 meant low risk of breast cancer while a score of 10 indicated high risk. Three thresholds were also used to assess the AI system's performance in selecting or not selecting exams for suspicious findings. Threshold one was set at an AI score of 10, threshold two was set for a selection rate similar to the consensus rate (8.8%), and threshold three was set for a selection rate similar to an average individual radiologist (5.8%).
A total of 957 breast cancers were included in the study, 752 being screen-detected and 205 being interval cancers. These came from 122,969 screening examinations taken between 2009 and 2018 from 47,877 women in Norway.
The team found that the AI system gave a score of 10 to 745 breast cancers out of the total (77.9%). That score applied to 653 of the screen-detected cancers (86.8%) and 92 of the interval cancers (44.9%) with threshold one.
Meanwhile 602 of the total screen-detected cancers (80.1%) and 63 of the interval cancers (30.7%) were selected by the system when threshold three was used.
Screen-detected cancer with AI scores not selected using the thresholds had favorable histopathologic characteristics compared with those selected. However, the researchers found the opposite results for interval cancer.
The study authors wrote that their results show the AI system can safely select exams not to be interpreted by radiologists, leading to decreased imaging volume for the latter. The radiologist could identify the small number of missed cancers if double reading alongside the AI system, the researchers wrote.
"Furthermore, 23% of screen-detected cancers in the study had a positive assessment by only one radiologist, and, thus, it may be acceptable that some cancers have a low AI score," they added.
Hofvind said more retrospective studies are needed to explore different designs for using the AI system in combination with radiologists, as well as large trials to test different interpretation designs to best use radiologists and AI in harmony.
"We are testing different interpretation designs, and different vendors, reviewing the cases with discordant findings between the AI and radiologists, and are planning prospective studies," he told AuntMinnieEurope.com.