Artificial intelligence (AI) software performed almost as well as senior radiologists for evaluating breast density and five-year breast cancer risks among women in France, according to the findings of a major 2019 study published on 17 August in Diagnostic and Interventional Imaging.
An AI algorithm to assess breast density from digital mammograms demonstrated interoperator agreement just shy of an "almost perfect" classification. Thanks to technical advances, the algorithm's performance was much improved over older AI models, which tend to have relatively poor performance for evaluating breast density.
"Our study demonstrates the value of an AI model for predicting the risk of [breast cancer] at five years, as well as the high concordance between senior and junior radiologists for breast density assessment," noted Dr. Morwenn Le Boulc'h, Prof. Isabelle Thomassin-Naggara and colleagues from the department of radiology at Sorbonne University Hospital Tenon in Paris.
The new study included 311 women consecutively screened for breast cancer at a Paris clinic in January and February of 2019. The women were between the ages of 40 and 74 and underwent both full-field digital mammography and digital breast tomosynthesis.
Four radiologists independently read both sets of mammograms and assigned breast density categories according to BI-RADS: A or B for nondense breast tissue and C or D for dense breast tissue. The reading radiologists included two junior clinicians with one year of experience and two senior clinicians with at least five years of experience.
The authors compared the radiologists' BI-RADS density categorizations to those automatically generated by DenseeMammo, an AI software program from Predilife. The program was trained using more than 10,000 mammograms.
The DenseeMammo algorithm showed substantial agreement with both the senior and junior radiologists on quadratic kappa (k) coefficients. In this type of analysis, scores closer to 1 indicate better agreement, with scores of 0.81-0.99 suggesting almost perfect agreement.
Compared with junior radiologists, the AI program netted a kappa score of 0.76, much higher than the kappa scores of 0.46-0.61 generated by earlier versions of AI to analyze breast density, the authors noted. The program performed even better when compared with senior radiologists, with a kappa score just shy of near-perfect agreement (0.79).
The authors further assessed the performance of AI for estimating five-year breast cancer risk using a second Predilife program called MammaRisk. The program used breast density, family history, and prior biopsy results to assign patients into a low, moderate, high, or very high-risk category.
Interclass correlation coefficient (ICC) assessment revealed a very strong relationship between the AI and radiologist performance when assessing risk. The AI program showed excellent reliability at predicting five-year outcomes with an ICC score of 0.96 compared with senior radiologists and 0.95 compared with junior radiologists.
The results demonstrated that improvements in the mechanisms behind AI programs, including the use of a deep-learning convolutional neural network, are having marketed improvement on their clinic performance.
Number of patients assigned to each BI-RADS breast density category | ||||
BI-RADS A | BI-RADS B | BI-RADS C | BI-RADS D | |
Junior radiologist | 33 | 104 | 159 | 15 |
Senior radiologist | 28 | 122 | 151 | 10 |
AI program | 25 | 114 | 151 | 21 |
However, the authors cautioned that the software is still not perfect. Compared with the performance of senior radiologists, the AI program miscategorized 39 patients, including 14 women who were mistakenly classified as having nondense breast tissue.
"Even if this is lower than the standard error risk, this misidentification could have an impact on routine practice and could exclude these patients from undergoing supplementary screening investigation such as ultrasonography," the authors wrote.
Another big drawback of the breast density analysis program was that the AI software can only be used with digital mammography, even as digital breast tomosynthesis (DBT) becomes more prominent for mammography. As a result, the authors called for future studies that evaluate the performance of AI for DBT.
"Our results demonstrate a good correlation between the evaluation of breast density by radiologists and AI systems and on [breast cancer] risk evaluation," the authors concluded. "Further studies need to be performed to implement the evaluation of [DBT] in a larger multicenter cohort."