A proof-of-concept study suggests that ChatGPT-4 can generate relevant differential diagnoses for specific imaging patterns, a research group from the University of Cologne in Germany has reported.
Researchers led by Dr. Jonathan Kottlors tested the large language model's (LLM) ability compared with a panel of experts in four radiology subspecialties and found high concordance rates. The study highlights ChatGPT''s potential to support diagnostic decisions, the group wrote.
"One important benefit of the proposed approach is potentially significant time savings compared to traditional literature research, which might be particularly relevant for radiologists in training aiming to reconcile clinical productivity and continuous expansion of knowledge," the researchers wrote in an article published on 5 July in Radiology.
Recognition of imaging patterns and their attribution to certain pathologies are key steps of the diagnostic process in radiology, with doctors often consulting relevant literature to verify or expand diagnoses. LLMs such as ChatGPT-4 by OpenAI allow accessing and contextualizing vast amounts of information.
In this study, the group hypothesized that leveraging ChatGPT-4's ability to comprehend and generate human-like text information could be used to emulate the process of deriving important differential diagnoses for certain imaging patterns.
The researchers selected four imaging patterns with potential differential diagnoses in neuroradiology and abdominal and musculoskeletal radiology. They then entered text-based descriptions of the patterns into GPT-4 and prompted it to provide the top five most important differential diagnoses.
Next, three experts in each subspecialty provided their consensus of the five most important differential diagnoses for each pattern. Experts were also asked to determine the number of AI diagnoses that were "acceptable."
According to the analysis, GPT-4 attained a concordance of 68.8% (55 of 80) with the experts at determining top differential diagnoses based on imaging patterns, and 93.8% (75 of 80) of differential diagnoses proposed by GPT-4 were deemed acceptable alternatives.
"Our investigation serves as a proof-of-concept for the ability of LLMs to generate relevant differential diagnoses for specific imaging patterns, and hence their potential for diagnostic decision support," the researchers wrote.
Further research is warranted, they noted.
"Our results are preliminary and prone to bias from the retrospective study design, requiring verification in a prospective, real-world setting, implying contextualization and integration of non-imaging clinical information," the researchers concluded.
The full study can be found here.