A new, simple scoring system for breast MRI not only compensates for reader experience, it also gives BI-RADS a run for its money, researchers from Austria found. Known as Tree, the scoring system shows high diagnostic accuracy in mass and nonmass lesions and also improves diagnostic accuracy in nonexpert readers.
Four breast radiologists with different levels of MRI experience and blinded to histopathology evaluated all exams in the single-center study. Readers independently applied two methods to classify breast lesions: BI-RADS and Tree.
BI-RADS provides a reporting lexicon that is empirically translated into likelihoods of malignancy; Tree is a scoring system that results in a diagnostic category. Lead author Dr. Maria Adele Marino, from the Medical University of Vienna, found Tree criteria were simple enough to be understood by the readers who were all trained in different institutions, and interreader agreement was even higher when compared with BI-RADS reading (European Radiology, 29 October 2015).
"That was astonishing as BI-RADS is considered a universal language every breast radiologist is trained in from the very beginning," corresponding author Dr. Pascal Baltzer said in an interview with AuntMinnieEurope.com. "On the other hand, it is not astonishing at all, as BI-RADS does not provide an algorithm to convert specific lesion findings into a diagnosis. This explains why Tree is more reproducible: It simply provides rules where BI-RADS does not."
Also, although there are several choices to make with possible differences among readers, the general diagnostic recommendation of biopsy or not is very robust, added Baltzer, also from the Medical University of Vienna. And because the Tree was also helpful in nonmass lesions, it is a helpful addition to BI-RADS.
What is Tree and why should it be used?
Although highly accurate, breast MRI can be challenging: Many different criteria can be used for image interpretation, and technical recommendations encompass a broad variety of examination and interpretation quality. The most widely accepted standard is the American College of Radiology (ACR) BI-RADS lexicon, which contains a structured common language for interpretation and reporting of mammography, ultrasound, and MRI.
"Without a doubt, the BI-RADS lexicon facilitates communication among physicians through the use of a standardized terminology," Marino and colleagues wrote. "The MRI BI-RADS lexicon features cover lesion morphology, such as margins, internal enhancement pattern, and functional contrast enhancement kinetics. However, the BI-RADS lexicon does not provide defined rules by which to convert specific imaging features into a diagnostic category."
Also, the use of multiple diagnostic criteria is associated with the risk of information redundancy and, as a consequence, interreader agreement is generally moderate while diagnostic accuracy is highly variable.
Marino's team proposed a classification Tree flowchart as a structured and intuitive algorithm for the differentiation of malignant and benign lesions. In that algorithm, five diagnostic criteria independently contribute to lesion diagnosis, and each specific combination of criteria provides a likelihood of malignancy.
But how does the Tree scoring system measure up? And how does it compare with BI-RADS? The researchers sought to determine just that.
How Tree compares
The researchers included 100 patients with 121 consecutive histopathologically verified lesions (52 malignant, 68 benign). Four breast radiologists with different levels of MRI experience and blinded to histopathology retrospectively evaluated all examinations from a 1.5-tesla Siemens Healthcare Espree system. They were compared by receiver operator characteristics (ROC) analysis and kappa statistics.
Interreader agreement was substantial to almost perfect (kappa: 0.643-0.896) for Tree and moderate (kappa: 0.455-0.657) for BI-RADS. Diagnostic performance using Tree was similar to BI-RADS. Less experienced radiologists achieved area under the curve (AUC) improvements up to 4.7% using Tree, while an expert's performance did not change (p = 0.526). The least experienced reader improved in specificity using Tree (16%, p = 0.001). The researchers found no further sensitivity and specificity differences.
Also of note, all readers achieved 100% sensitivity in nonmass lesions, while specificity stayed similar or improved with Tree. The improvement did not show statistical significance due to the low number of cases, but, similar to mass lesions, the improved performance was strongest in the inexperienced reader, the researchers found.
"These results have important clinical implications: In addition to BI RADS, Tree provides specific guidance about what certain combinations of lesion features indicate with regard to potential malignancy," they wrote. "This simplifies and structures the process of lesion interpretation."
Limitations and future studies
In terms of limitations, one would expect a certain bias toward higher interreader agreement by the mono-centric design of this study. However, all four readers were trained in four different institutions and underwent only a short training session.
"Therefore, our results clearly demonstrate the high reproducibility of Tree, which was superior to the BI-RADS reading approach," the study authors wrote. "It is not our intent to replace the BI-RADS lexicon. On the contrary: Tree is complementary to BI-RADS, as it provides empirically validated guidance where no specific recommendations are contained in BI-RADS."
Also, the results may not directly apply to a general population. And, because of the large number of negative MRI cases not referred for biopsy, specificity is likely to be higher. Plus, the study was performed considering MRI features only and did not integrate patient characteristics, which yielded higher diagnostic accuracy in a prior study on nonmass lesions.
"At our institution, we more and more rely on the Tree, and I train all my residents, fellows, and observers in it," Baltzer told AuntMinnieEurope.com. "Barriers to be overcome before multi-institutional use of the Tree are mainly due to the typical conservative attitude of medical doctors: They tend not to use what is new and what they do not know."
Next, Marino and colleagues are planning a multi-institutional study with centers all over Europe, which already includes Germany, Austria, Italy, and Portugal.
"My hope is that the use of the Tree in this study will help in distributing this useful classification algorithm that is especially helpful for nonexpert readers," he added. "But even in expert readers, the Tree does not perform worse. On the contrary: While the same accuracy is reached, the steps leading to the diagnosis are documented for everyone to reconsider."