Monitor breast screening on the local level, Dutch say

Feb 27, 2014

Researchers have found significant variations in the performance of pairs of radiologists assigned to double read screening mammograms in a breast cancer screening program in the Netherlands. The findings emphasize the importance of monitoring radiologist performance on a local scale, they concluded.

Sensitivity is an important indicator in the evaluation of any screening program. Sensitivity is influenced by a number of factors, including the screening interval, age of the target population, mean speed of growth of the mix of tumors, technical level of the screening test, and ability of radiologists to detect early signs of cancer growth.

Dr. Elisabeth Klompenhouwer from the radiology department at Catharina Hospital in Eindhoven, the Netherlands, and colleagues sought to determine variations in mammography screening performance among radiologists performing nonblinded double reading of mammograms. The researchers measured referral rate, cancer detection rate, sensitivity, and positive predictive value (PPV) of referral (European Radiology, 6 February 2014).

In the Netherlands, all mammograms are consistently double-read, routinely in a nonblinded fashion. In the case of a discrepant reading, the two radiologists may discuss the case together to reach a consensus, a third reader may be added for arbitration, or the woman may routinely be referred without consensus reading or arbitration.

Substantial interobserver variability in screening mammography interpretation among screening radiologists has been well-documented, but population-based studies on the screening results of pairs of screening radiologists have not yet been published, until now.

The current study included a total of 310,906 screening mammograms, read by 26 pairs of screening radiologists. The exams were obtained at two screening units in a breast cancer screening area in a southern region of the country (Bevolkingsonderzoek Zuid) between January 1997 and January 2011. Pairs of radiologists had at least 7,500 screening exams between them.

In June 2009, film-screen mammography was replaced by full-field digital mammography (Selenia, Hologic). Women with normal or benign mammographic findings, or with nonspecific minimal signs, were not referred. If a mammogram showed a suspicious or malignant lesion, the woman was referred to a surgical oncologist or breast clinic for further analysis.

During a two-year follow-up, breast imaging reports, surgical reports, and pathology results were collected for all referred women and interval cancers. Referral rate, cancer detection rate, PPV, and sensitivity were calculated for each pair.

The referral rate ranged from 1.0% to 1.5%, and the cancer detection rate ranged from 4.0 to 6.3 per 1,000 screens. The program sensitivity and PPV of referral ranged from 55.1% to 81.5% and from 28.7% to 49.5%, respectively.

"In 2009, we reported a significant interobserver variation in the individual screening performances among radiologists," Klompenhouwer and colleagues wrote. "We expected that the addition of a second reader would solve these variations in screening outcome; however, our results proved otherwise."

The high variability in detecting cancer seems to relate to the referral rate. Individual pairs of radiologists showed no correlation between number of screens and sensitivity, number of screens and PPV, or sensitivity and PPV. The only correlation with increased cancer detection was a higher recall rate.

It also seems that radiologists who follow a woman from start to finish do better when reading mammograms, according to the researchers.

"A radiologist who interprets screening examinations without the opportunity to know the outcome of the recalled women may be hampered in improving his or her recall behavior," they wrote. "Radiologists who also perform diagnostic workup of referred women, including radiologists with fellowship training in breast imaging, build up their experience by assessing a case from screening through workup [until] final diagnosis."

The researchers acknowledged their numbers are too small to calculate these correlations among the pairs of radiologists, but they found significantly better sensitivities in three of the pairs, each of which consisted of at least one dedicated breast radiologist. Conversely, they found worse sensitivities in two screening pairs; namely, one radiologist involved in diagnostic breast imaging and one radiologist not involved in diagnostic breast imaging, but no dedicated breast radiologist.

"It may be desirable to create specific pairs of radiologists in which radiologists specialized in breast imaging are combined with radiologists who are not," the researchers wrote.

The researchers also noticed missed interval cancers presented more often as architectural distortion or asymmetry, and constituted a significantly higher proportion of invasive lobular and mixed ductal/lobular cancers.

"These findings suggest that more attention should be paid to asymmetries and architectural distortions as possible signs of malignancy at screening mammography," they wrote.

To detect suboptimal results among pairs of radiologists, it is important to monitor screening results on a local scale and to continuously update interval cancers at screening mammography programs, the researchers concluded.