A commercially available AI algorithm improved the performance of junior radiologists when grading knee osteoarthritis on x-rays, according to a Danish-led study published on 9 July in Radiology.
In a reader study at three European centers, three out of six junior radiologists showed higher performance with versus without the AI software when evaluating knee osteoarthritis according to the Kellgren-Lawrence grading scale.
“Concurrent AI assistance improved osteoarthritis grading performance of junior readers and increased interobserver agreement across all readers,” noted lead author Dr. Mathias Brejnebøl of the Bispebjerg and Frederiksberg Hospital in Copenhagen.
Knee osteoarthritis is a serious joint disease characterized by joint pain, stiffness, and functional limitations and affects an estimated 365 million people worldwide, the authors wrote. The Kellgren-Lawrence (KL) grading system ranks osteoarthritis from none (score of 0) to severe (score of 4) on x-rays, with a KL grade of 3 or 4 required by several U.S. health insurance providers before approving knee arthroplasty, they noted. However, conflicting findings in the medical literature suggest there is a lack of consistency in using the system, the group added.
Hence, the researchers explored whether assistance with a European-cleared AI tool (RBknee version 2.1, Radiobotics) could improve the interobserver agreement of radiologists and orthopedists of various experience levels when grading the disease. The group collected a total of 225 standing knee x-rays from patients with suspected knee osteoarthritis from three participating European centers between April 2019 and May 2022. Each center recruited four readers across radiology and orthopedic surgery at in-training and board-certified experience levels.
In a clinical setting, the AI tool provides an image overlay and generates a report. For this study, the researchers built a web-based platform in which the grading fields were prefilled with AI tool outputs. All readers used the KL grading system either with or without AI assistance compared with a reference standard established by three musculoskeletal radiology consultants.
According to the analysis, AI assistance increased the KL grading performance of three of six junior readers, with areas under the receiver operating characteristic curve (AUC) increasing in ranges from 0.81 to 0.88, 0.76 to 0.86, and 0.89 to 0.91.
Additionally, board-certified musculoskeletal radiologists achieved strong agreement for grading with AI (κ = 0.90), which was higher than that achieved by reference readers independently (κ = 0.84).
“AI assistance can yield very strong agreement while also maintaining grading performance," the group wrote. "This is important, as previous studies found that a higher preoperative KL grade was associated with better pain-related and functional outcomes."
Ultimately, the KL grade is primarily used in research, whereas in clinical practice, a descriptive report is used, the researchers wrote. However, this report commonly assigns “no,” “doubtful,” “mild,” “moderate,” or “severe” knee osteoarthritis to the image, which correspond to the five KL grades, they added.
“AI-assisted grading could enhance patient inclusion consistency in pragmatic randomized clinical trials and will be important as the Kellgren-Lawrence grading system is increasingly used in selecting patient candidacy for knee arthroplasty,” the researchers concluded.
The full study is available here.