GPT-4 assists in management of glioblastoma patients

Jul 26, 2024

OpenAI's GPT-4 AI model can utilize imaging reports to generate summaries of disease course in patients with complex glioblastoma, improving treatment planning and potentially even enhancing radiology workflows, according to research published on 23 July in Radiology.

For a retrospective multicenter study, the large language model generated multiple brain MRI reports from information included in 375 neuroradiologist-developed reports (the 5 most recent MRI scans of 75 patients with confirmed diagnoses of glioblastoma between August 2018 and March 2023, from four centers in Germany: University Hospitals Cologne, Düsseldorf, Bonn, and Essen).

No image data were transmitted to GPT-4. Instead, the radiologist-generated reports provided a descriptive report and final impression, histologic and molecular characteristics as part of the medical report, treatment history, and current clinical status, according to co-lead authors Kai Laukamp, MD, and Robert Terzis, MD, of the University of Cologne in Germany, and colleagues.

Researchers tasked the GPT-4 algorithm in three ways:

To generate a summary of the disease course
To provide an update on status based on the MRI reports in text form
To provide an update on status based on the MRI reports in an R code-generated graph of a patient’s tumor course and disease progression

Zero-shot prompting for each patient, along with the five MRI reports, was used and processed separately in an input field on the web interface for GPT-4, Laukamp and colleagues noted.

Among the findings the researchers reported, GPT-4 achieved an overall agreement in 68 of 75 cases (91%) with the expert consensus for adequately representing the disease course. In detail, the following were "adequately represented":

Necessary information, 71 of 75 (95%)
Relevant incidental findings, 10 of 12 (83%; most disease courses did not show any relevant incidental findings [63 of 75; 84%])
Medical history, 67 of 75 (89%)

The GPT-4 algorithm visualized the tumor course, highlighting tumor responses and progress, according to the authors. In addition, the article in Radiology provided an English translation of the original German summary for the case of a 48-year-old female patient.

Summaries created by GPT-4 regarding the disease course received a median quality score of 4 and a median perceived utility score of 3, the researchers reported. Laukamp and colleagues also said the summaries proved useful for treating physicians and added that none of the summaries were rated as having negative consequences such as omitting relevant information or adding incorrect information.

"In clinical routine, GPT-4 summaries could improve preparations for patients with complex glioblastoma in multidisciplinary tumor boards and potentially improve radiology workflows by providing a comprehensive overview of prior imaging, thus ensuring faster disease status assessments," wrote the authors. "Furthermore, our experimentation with GPT-4–generated R code for graphical representation of the disease course suggests potential for diverse future applications in viewing patient data."

The researchers acknowledged limitations of the study, such as its reliance on summaries of clinical status based on variable text information that poses challenges to reliability and standardization, especially regarding lesion measurement; and it may have benefited from existing clinical summaries that were provided. Also, the GPT-4 model’s decisions lacked deterministic understandability and traceability, presenting a fundamental challenge known as the "black-box issue."

While the cohort was relatively small and only composed of patients with glioblastoma, the structured information basis of glioblastoma cases may facilitate a solid foundation for GPT-4 to generate comprehensive summaries of patient disease trajectories, serving as an example use case in summarizing and integrating clinical histories and imaging information that ultimately could streamline the longitudinal assessment process, the authors explained.

"Future research should explore the impact of GPT-4 and other large language model summaries across different specialties, tumor types, and structured reporting, including automated and standardized lesion measurements," Laukamp et al noted. "Existing limitations could be overcome by optimizing workflows that access information from various disciplines, and by fine-tuning GPT-4 through transfer learning, developing a pretrained language model from scratch specifically tailored for interdisciplinary medical summaries and patient monitoring."

The full journal article can be found here.