Stand-alone AI-based software performed well at detecting fractures in pediatric patients in a clinical setting and improved the diagnostic accuracy of emergency room (ER) readers, a new German study shows.
The study, supervised by Dr. Daniel Gräfe of the Institute for Medical Informatics, Statistics, and Epidemiology at Leipzig University and Maciej Rosolowski of the Department of Pediatric Radiology at Leipzig University Hospital was published in European Radiology on 7 April. Gräfe, Rosolowski, and their colleagues aimed to assess the ability of a standalone AI program at diagnosing fractures in pediatric patients in a real-world setting, as well as to determine the influence of the AI’s assistance on the diagnostic performance of inexperienced ER physicians in reading pediatric x-rays.
Radiograph of the right foot of an 8-year-old girl showing fractures of the distal second and third metatarsal bones. The AI incorrectly identified a false positive at the apophysis of the fifth metatarsal bone, as well as at the fourth metatarsal and proximal phalanx of the fifth toe. The latter two findings are marked with a question mark due to the AI’s low confidence level. Images available for republishing under Creative Commons license (CC BY 4.0 DEED, Attribution 4.0 International) and courtesy of the European Journal of Radiology.
Pediatric fractures have unique traits that can make them challenging to detect, especially for less-experienced readers and nonradiologists. While there are numerous CE-marked algorithms available to aid nonradiologist physicians and staff in their initial assessment of x-rays in the ER, the researchers pointed out that only a few include diagnostic capabilities for pediatric patients. An AI model trained solely on adult fractures may not perform adequately in assessing pediatric fractures.
As the authors pointed out, missed fractures in children may carry more long-term clinical consequences, such as impaired or altered growth and limited joint mobility. Moreover, these missed diagnoses may also have medicolegal consequences, with long-term damage resulting from misinterpretation of x-rays being a common reason for malpractice suits.
With musculoskeletal injuries being the most common reason for ER visits among children, tools offering accurate initial assessment -- and “second look” AI tools suited to pediatric patients -- are critical.
For the “real life” cohort of the study, the researchers included 1,672 radiographs of 1,657 children ages 2 to less than 18 years who were referred to the pediatric surgical emergency department of our tertiary center between March and October 2023. They then selected radiographs from 2008 to 2023 of one of three pediatric-specific and medicolegally significant fracture types from the institutional radiological information system for the “medicolegal” cohort for comparison.
The three readers consisted of two surgical residents from the pediatric surgery emergency department with three and six months of experience in reading pediatric x-rays, and a general radiology resident with two years of experience reporting pediatric x-rays. With only medical history and conventional radiographs available to them, the readers were asked (independently, in separate sessions) to determine presence and location of possible fractures. They were then shown the AI program’s assessment and asked to reassess the x-ray.
In the real-life cohort, the readers correctly diagnosed 87.5% of patients without AI; there was no significant difference between the surgical and radiological residents. The mean number of missed fractures of the readers was reduced by AI from 129 (13.8% of all fractures) to 100 (10.7% of all fractures), resulting in a decrease of missed fractures of 22%. The addition of AI corrected the reader’s initial incorrect diagnosis in 4.1%; the reader’s initial correct diagnosis was wrongly rejected by AI in 1.5% of cases. With AI assistance, patient-wise sensitivity increased from 83.7% to 87.3% and specificity from 90.7% to 92.4%.
Frequently missed fracture entities with medicolegal significance according to van Laer. a Radial condyle fracture, (b) fracture of the proximal tibia, and (c) fracture of the medial malleolus. The lucency marked by an arrow indicates the fracture. Sens, sensitivity; Spec, specificity.
The authors noted that more than half of the algorithm's false-positive findings for this cohort were attributable to the apophysis of the fifth metatarsal bone; they note that inexperienced readers often find distinguishing between fracture and apophysis in children is commonly challenging.
For the medicolegally relevant cohort, AI attained sensitivity of 100% for proximal tibia fractures, 96% for medial ankle fractures, but only 68% for radial condyle fractures. The latter finding was noted by the authors as cause for concern, as the three fracture types were chosen for their medicolegal significance, and the elbow being a particularly common area for misinterpretation.
The residents’ patient-wise sensitivity was improved with AI assistance from 84% to 87%, specificity from 91% to 92%, and diagnostic accuracy from 88% to 90%. However, the residents discarded correct diagnoses that AI rejected in 2% of cases.
The authors also noted that the readers’ diagnostic confidence was higher with correct diagnoses and lower with incorrect ones, whether or not they were assisted by the algorithm. Interestingly, in cases where the algorithm’s interpretation led the readers to change their diagnosis, initial confidence was low but improved in two of the three readers when AI led to the correct decision. However, confidence decreased significantly in two of the readers when AI led to a change from a correct to an incorrect diagnosis.
Nevertheless, even with the caution that the algorithm would benefit from increased training on certain kinds of fractures, especially with its low performance on the radial epicondyle fracture, the authors noted the high level of diagnostic accuracy attained by the AI, and felt that its benefits were significant enough to merit consideration.
“Despite the limited overall increase in accuracy, based on our experience with fracture detection AI software, even experienced pediatric radiologists benefit from this ‘second pair of eyes,’ leading to a reduction in satisfaction-of-search errors. This is especially true when fractures are located in unexpected areas, such as a partially imaged bony ligament tear in the upper ankle joint, even though the forefoot was the primary focus,” they wrote.
Read the study findings here.