GPT-4 -- a multimodal large language model created by OpenAI -- can accurately transform free-text knee MRI reports into structured reports, with minimal human oversight required to fix the rare minor errors identified, a study from New Zealand has found.
"This method can potentially enhance communication with the requesting clinicians, providing a report suited to their preferences, without being burdensome for the reporting radiologist," Dr. Mark Bekhit, a radiology registrar at Auckland City Hospital, noted in an e-poster at the annual scientific meeting (ASM) of the Royal Australian and New Zealand College of Radiologists (RANZCR), held recently in Perth.
Knee MRI is the most performed musculoskeletal cross-sectional imaging exam. Recently there has been a push toward structured radiology reporting, seeing an improvement in quality, standardization, and communication with clinician, explained Bekhit, along with first author Dr. Salam Iwaz, house officer at Auckland City Hospital.
A so-called middle-ground structured knee MRI report, which includes headers for different anatomic compartments and allows for grouping of relevant pathology, appears to be preferred by orthopedic surgeons, but producing structured reports can be a timely and challenging task for radiologists.
Advanced language models, such as GPT-4 through ChatGPT, can be prompted to efficiently convert free-text reports to structured reports, but there is currently limited data on using such language models for this context, they pointed out.
The researchers' objective was to determine the accuracy of GPT-4 in converting free-text knee MRI reports from Auckland City Hospital into the more preferred structured reports.
They conducted a retrospective review of MRIs performed between September and December 2023, including MRI scans that had unstructured (free-text) reports but excluding scans that already had structured reports. Prompts were then made to GPT-4 to convert the free-text reports to the preferred middle-ground structured reports. These reports were assessed by the research team for accuracy of major diagnosis, positioning of minor findings, missing or wrong information, and accuracy of headings.
Iwaz and Bekhit identified a total of 109 studies were identified. Seventy-one (65%) studies met the inclusion criteria and were included in the study. There were 33 (46%) right knee MRIs, and 38 (54%) left knee MRIs.
They compared the GPT-4 converted reports to the original reports to determine if the major diagnosis was correct, if the correct headings were present, if the positioning of the findings was under the correct headings, and if there were any missing or incorrectly converted findings.
The figure below shows the results of their analysis of the outcomes measured to assess the accuracy of the structure reports when compared with the original reports.
"We hope to do follow-up studies on this with larger patients in a way that can hopefully be more applicable so that it can be implemented in a healthcare setting," Bekhit told AuntMinnieEurope.com.
You can read the full e-poster on the EPOS section for the RANZCR ASM 2024 via the ESR's website.